cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Are there any recommended spark config settings for Delta/Databricks?

alejandrofm
Valued Contributor

Hi! I'm starting to test configs on DataBricks, for example, to avoid corrupting data if two processes try to write at the same time:

.config('spark.databricks.delta.multiClusterWrites.enabled', 'false')

Or if I need more partitions than default

.config('spark.databricks.adaptive.autoOptimizeShuffle.enabled', 'true')

Is there another recommended default setting? (then goes the tunning for each job)

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Ryan_Chynoweth
Esteemed Contributor

Delta tables do have optimistic concurrency control. So if two processes are trying to write to the same table it does its best to handle both but if the transactions are conflicts then it will fail. You can also change the isolation levels if you want to enforce more control on that.

View solution in original post

4 REPLIES 4

Ryan_Chynoweth
Esteemed Contributor

Delta tables do have optimistic concurrency control. So if two processes are trying to write to the same table it does its best to handle both but if the transactions are conflicts then it will fail. You can also change the isolation levels if you want to enforce more control on that.

Hubert-Dudek
Esteemed Contributor III

Exactly. You can easy verify that as commits are written to separate files in delta log.

Regarding:

.config('spark.databricks.adaptive.autoOptimizeShuffle.enabled', 'true')

and other spark optimization solutions please watch databricks video about that https://www.youtube.com/watch?v=daXEp4HmS-E

It helped but still testing different configurations, thank you!

Anonymous
Not applicable

Hey there @Alejandro Martinez​ 

Hope everything is going well.

Just wanted to see if you were able to find an answer to your question. If yes, would you be happy to let us know and mark it as best so that other members can find the solution more quickly?

Cheers!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group