cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

brickster_2018
by Esteemed Contributor
  • 1935 Views
  • 1 replies
  • 0 kudos

Resolved! Is it mandatory to checkpoint my streaming query.

I have ad-hoc one-time streaming queries where I believe checkpoint won't give any value add. Should I still use checkpointing

  • 1935 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

It's not mandatory. But the strong recommendation is to use Checkpointing for Streaming irrespective of your use case. This is because the default checkpoint location can get a lot of files over time as there is no graceful guaranteed cleaning in pla...

  • 0 kudos
User16783855534
by New Contributor III
  • 1205 Views
  • 2 replies
  • 0 kudos

Should/Can I use spark streaming for Batch workloads?

Its preferable to use spark streaming (with Delta) for batch workloads rather then regular batch. With the trigger.once trigger whenever the streaming job is started it will process whatever is available in the source (kafka/kinesis/File System) and ...

  • 1205 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The streaming checkpoint mechanism is independent of the Trigger type. The way checkpoint works are it creates an offset file when processing the batch and once the batch is completed it creates a commit file for that batch in the checkpoint director...

  • 0 kudos
1 More Replies
brickster_2018
by Esteemed Contributor
  • 746 Views
  • 1 replies
  • 0 kudos

How to migrate to Auto-loader without downtime?

I have an S3-SQS workload. Is it possible to migrate the workload to autoloader without downtime? What are the migration guidelines.

  • 746 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The SQS queue used by the existing application can be utilized by the auto-loader thereby ensuring minimal downtime

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1056 Views
  • 1 replies
  • 0 kudos
  • 1056 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The issue can happen if the Hive syntax for table creation is used instead of the Spark syntax. Read more here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat.htmlThe issue mentioned in t...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 3331 Views
  • 1 replies
  • 0 kudos

Resolved! How to track the history of schema changes for a Delta table

I have a Delta table that had schema changes in multiple commits. I wanted to track all these schema changes that happened on the Delta table. The "DESCRIBE HISTORY" is not useful as it logs the schema change made by ALTER TABLE operations.

  • 3331 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

When a write operation is performed with columns added. we are not explicitly showing that in DESCRIBE HISTORY output. Only an entry is made for write. and in the operation Parameters, it's not showing anything about schema evolution. whereas if we d...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 2313 Views
  • 1 replies
  • 0 kudos
  • 2313 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Yes, it's possible to use Kafka API to connect to the eventhub. Eventhub supports the usage of Kafka API to stream the data from the EventhubReference: https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overviewSample pr...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 14019 Views
  • 1 replies
  • 0 kudos

Resolved! How do I change the log level in Databricks?

How can I change the log level of the Spark Driver and executor process?

  • 14019 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Change the log level of Driver:%scala   spark.sparkContext.setLogLevel("DEBUG")   spark.sparkContext.setLogLevel("INFO")Change the log level of a particular package in Driver logs:%scala   org.apache.log4j.Logger.getLogger("shaded.databricks.v201809...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1245 Views
  • 1 replies
  • 0 kudos

Resolved! I do not have any Spark jobs running, but my cluster is not getting auto-terminated.

The cluster is Idle and there are no Spark jobs running on the Spark UI. Still I see my cluster is active and not getting terminated.

  • 1245 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Databricks cluster is treated as active if there are any spark or non-Spark operations running on the cluster. Even though there are no Spark jobs running on the cluster, it's possible to have some driver-specific application code running marking th...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 2686 Views
  • 1 replies
  • 0 kudos
  • 2686 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Disclaimer: This code snippet uses an internal API. It's not recommended to use internal API's in your application as they are subject to change or discontinuity. %python import requests API_URL = dbutils.notebook.entry_point.getDbutils().notebook(...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1704 Views
  • 1 replies
  • 0 kudos

Resolved! Why do I see my job marked as failed on the Databricks Jobs UI, even though it completed the operations in the application

I have a jar job running migrated from EMR to Databricks. The job runs as expected and completes all the operations in the application. However the job run is marked as failed on the Databricks Jobs UI.

  • 1704 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Usage of spark.stop(), sc.stop() , System.exit() in your application can cause this behavior. Databricks manages the context shutdown on its own. Forcefully closing it can cause this abrupt behavior.

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 814 Views
  • 1 replies
  • 2 kudos

Few things you should not do in Databricks!

Few things you should not do in Databricks!

  • 814 Views
  • 1 replies
  • 2 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 2 kudos

Compared to OSS Spark, these are few things the users don't have to worry about when running the same job on Databricks. Memory management: Databricks use an internal formula to allocate the Driver and executor heap based on the size of the instance....

  • 2 kudos
brickster_2018
by Esteemed Contributor
  • 1944 Views
  • 1 replies
  • 0 kudos
  • 1944 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Although not a hard limit, it's recommended to keep the number of cells in the notebook less than 100 for better UI experience as well as code readability. Having a really large block of code in a cell defeats the purpose of notebook execution and al...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 16669 Views
  • 1 replies
  • 0 kudos
  • 16669 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Yes, it's possible to download files from DBFS. To download the filesFiles stored in /FileStore are accessible in your web browser at https://<databricks-instance-name>.cloud.databricks.com/files/. For example, the file you stored in /FileStore/my-da...

  • 0 kudos
User16783853501
by New Contributor II
  • 957 Views
  • 2 replies
  • 0 kudos

What is the best way to convert a very large parquet table to delta ? possibly without downtime!

What is the best way to convert a very large parquet table to delta ? possibly without downtime! 

  • 957 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

I vouch for Sajith's answer. The main advantage with "CONVERT TO DELTA" is that operations are metadata centric which means we are not reading the full data for the conversion. For any other file format conversion, it's necessary to read the data com...

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels