cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Databricks Employee
  • 2256 Views
  • 1 replies
  • 0 kudos
  • 2256 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

The issue can happen if the Hive syntax for table creation is used instead of the Spark syntax. Read more here: https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-table-hiveformat.htmlThe issue mentioned in t...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 7027 Views
  • 1 replies
  • 0 kudos

Resolved! How to track the history of schema changes for a Delta table

I have a Delta table that had schema changes in multiple commits. I wanted to track all these schema changes that happened on the Delta table. The "DESCRIBE HISTORY" is not useful as it logs the schema change made by ALTER TABLE operations.

  • 7027 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

When a write operation is performed with columns added. we are not explicitly showing that in DESCRIBE HISTORY output. Only an entry is made for write. and in the operation Parameters, it's not showing anything about schema evolution. whereas if we d...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 3907 Views
  • 1 replies
  • 0 kudos
  • 3907 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Yes, it's possible to use Kafka API to connect to the eventhub. Eventhub supports the usage of Kafka API to stream the data from the EventhubReference: https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overviewSample pr...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 19164 Views
  • 1 replies
  • 2 kudos

Resolved! How do I change the log level in Databricks?

How can I change the log level of the Spark Driver and executor process?

  • 19164 Views
  • 1 replies
  • 2 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 2 kudos

Change the log level of Driver:%scala   spark.sparkContext.setLogLevel("DEBUG")   spark.sparkContext.setLogLevel("INFO")Change the log level of a particular package in Driver logs:%scala   org.apache.log4j.Logger.getLogger("shaded.databricks.v201809...

  • 2 kudos
brickster_2018
by Databricks Employee
  • 5970 Views
  • 1 replies
  • 1 kudos
  • 5970 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 1 kudos

Disclaimer: This code snippet uses an internal API. It's not recommended to use internal API's in your application as they are subject to change or discontinuity. %python import requests API_URL = dbutils.notebook.entry_point.getDbutils().notebook(...

  • 1 kudos
brickster_2018
by Databricks Employee
  • 3463 Views
  • 1 replies
  • 0 kudos

Resolved! Why do I see my job marked as failed on the Databricks Jobs UI, even though it completed the operations in the application

I have a jar job running migrated from EMR to Databricks. The job runs as expected and completes all the operations in the application. However the job run is marked as failed on the Databricks Jobs UI.

  • 3463 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Usage of spark.stop(), sc.stop() , System.exit() in your application can cause this behavior. Databricks manages the context shutdown on its own. Forcefully closing it can cause this abrupt behavior.

  • 0 kudos
brickster_2018
by Databricks Employee
  • 1699 Views
  • 1 replies
  • 2 kudos

Few things you should not do in Databricks!

Few things you should not do in Databricks!

  • 1699 Views
  • 1 replies
  • 2 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 2 kudos

Compared to OSS Spark, these are few things the users don't have to worry about when running the same job on Databricks. Memory management: Databricks use an internal formula to allocate the Driver and executor heap based on the size of the instance....

  • 2 kudos
brickster_2018
by Databricks Employee
  • 3642 Views
  • 1 replies
  • 0 kudos
  • 3642 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Although not a hard limit, it's recommended to keep the number of cells in the notebook less than 100 for better UI experience as well as code readability. Having a really large block of code in a cell defeats the purpose of notebook execution and al...

  • 0 kudos
brickster_2018
by Databricks Employee
  • 23269 Views
  • 1 replies
  • 0 kudos
  • 23269 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Yes, it's possible to download files from DBFS. To download the filesFiles stored in /FileStore are accessible in your web browser at https://<databricks-instance-name>.cloud.databricks.com/files/. For example, the file you stored in /FileStore/my-da...

  • 0 kudos
User16783853501
by Databricks Employee
  • 2066 Views
  • 2 replies
  • 0 kudos

What is the best way to convert a very large parquet table to delta ? possibly without downtime!

What is the best way to convert a very large parquet table to delta ? possibly without downtime! 

  • 2066 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

I vouch for Sajith's answer. The main advantage with "CONVERT TO DELTA" is that operations are metadata centric which means we are not reading the full data for the conversion. For any other file format conversion, it's necessary to read the data com...

  • 0 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 2207 Views
  • 2 replies
  • 0 kudos

Why should I move to Auto-loader?

I have a streaming workload using the S3-SQS Connector. The streaming job is running fine within the SLA. Should I migrate my job to use the auto-loader? If Yes, what are the benefits? who should migrate and who should not?

  • 2207 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

That makes sense @Anand Ladda​ ! One major improvement that will have a direct impact on the performance is the architectural difference. S3-SQS uses an internal implementation of the Delta table to store the checkpoint details about the source files...

  • 0 kudos
1 More Replies
aladda
by Databricks Employee
  • 3632 Views
  • 1 replies
  • 0 kudos
  • 3632 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Stats collected on a Delta column are either using for Partitioning Pruning, Data Skipping. See here - https://docs.databricks.com/delta/optimizations/file-mgmt.html#delta-data-skipping for detailsIn additional stats are also used for Metadata only q...

  • 0 kudos
User16783853501
by Databricks Employee
  • 2052 Views
  • 2 replies
  • 0 kudos

Delta Optimistic Transactions Resolution and Exceptions

What is the best way to deal with concurrent exceptions in Delta when you have multiple writers on the same delta table ?

  • 2052 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Databricks Employee
  • 0 kudos

While you can try-catch-retry , it would be expensive to retry as the underlying table snapshot would have changed. So the best approach is to avoid conflicts using partitioning and disjoint command conditions as much as possible.

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels