Where is the root Azure Storage instance?
I am trying to access logs from my Databricks notebook which is run as a job. I would like to see these logs in an azure storage account.
- 1334 Views
- 0 replies
- 0 kudos
I am trying to access logs from my Databricks notebook which is run as a job. I would like to see these logs in an azure storage account.
I work with parquet files stored in AWS S3 buckets. They are multiple TB in size and partitioned by a numeric column containing integer values between 1 and 200, call it my_partition. I read in and perform compute actions on this data in Databricks w...
I have ad-hoc one-time streaming queries where I believe checkpoint won't give any value add. Should I still use checkpointing
It's not mandatory. But the strong recommendation is to use Checkpointing for Streaming irrespective of your use case. This is because the default checkpoint location can get a lot of files over time as there is no graceful guaranteed cleaning in pla...
Its preferable to use spark streaming (with Delta) for batch workloads rather then regular batch. With the trigger.once trigger whenever the streaming job is started it will process whatever is available in the source (kafka/kinesis/File System) and ...
The streaming checkpoint mechanism is independent of the Trigger type. The way checkpoint works are it creates an offset file when processing the batch and once the batch is completed it creates a commit file for that batch in the checkpoint director...
I have an S3-SQS workload. Is it possible to migrate the workload to autoloader without downtime? What are the migration guidelines.
The SQS queue used by the existing application can be utilized by the auto-loader thereby ensuring minimal downtime
The issue can happen if the Hive syntax for table creation is used instead of the Spark syntax. Read more here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat.htmlThe issue mentioned in t...
I have a Delta table that had schema changes in multiple commits. I wanted to track all these schema changes that happened on the Delta table. The "DESCRIBE HISTORY" is not useful as it logs the schema change made by ALTER TABLE operations.
When a write operation is performed with columns added. we are not explicitly showing that in DESCRIBE HISTORY output. Only an entry is made for write. and in the operation Parameters, it's not showing anything about schema evolution. whereas if we d...
Yes, it's possible to use Kafka API to connect to the eventhub. Eventhub supports the usage of Kafka API to stream the data from the EventhubReference: https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overviewSample pr...
How can I change the log level of the Spark Driver and executor process?
Change the log level of Driver:%scala spark.sparkContext.setLogLevel("DEBUG") spark.sparkContext.setLogLevel("INFO")Change the log level of a particular package in Driver logs:%scala org.apache.log4j.Logger.getLogger("shaded.databricks.v201809...
The cluster is Idle and there are no Spark jobs running on the Spark UI. Still I see my cluster is active and not getting terminated.
Databricks cluster is treated as active if there are any spark or non-Spark operations running on the cluster. Even though there are no Spark jobs running on the cluster, it's possible to have some driver-specific application code running marking th...
Disclaimer: This code snippet uses an internal API. It's not recommended to use internal API's in your application as they are subject to change or discontinuity. %python import requests API_URL = dbutils.notebook.entry_point.getDbutils().notebook(...
No, currently it's not configurable. Synching of notebooks to Git repositories is configurable and can be turned off at the workspace level.
I have a jar job running migrated from EMR to Databricks. The job runs as expected and completes all the operations in the application. However the job run is marked as failed on the Databricks Jobs UI.
Usage of spark.stop(), sc.stop() , System.exit() in your application can cause this behavior. Databricks manages the context shutdown on its own. Forcefully closing it can cause this abrupt behavior.
Few things you should not do in Databricks!
Compared to OSS Spark, these are few things the users don't have to worry about when running the same job on Databricks. Memory management: Databricks use an internal formula to allocate the Driver and executor heap based on the size of the instance....
Although not a hard limit, it's recommended to keep the number of cells in the notebook less than 100 for better UI experience as well as code readability. Having a really large block of code in a cell defeats the purpose of notebook execution and al...
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up NowUser | Count |
---|---|
1615 | |
779 | |
349 | |
287 | |
267 |