Databricks Community

alejandrofm · ‎02-28-2022

Hi! I'm starting to test configs on DataBricks, for example, to avoid corrupting data if two processes try to write at the same time:

.config('spark.databricks.delta.multiClusterWrites.enabled', 'false')

Or if I need more partitions than default

.config('spark.databricks.adaptive.autoOptimizeShuffle.enabled', 'true')

Is there another recommended default setting? (then goes the tunning for each job)

Thanks!

Ryan_Chynoweth · ‎02-28-2022

Delta tables do have optimistic concurrency control. So if two processes are trying to write to the same table it does its best to handle both but if the transactions are conflicts then it will fail. You can also change the isolation levels if you want to enforce more control on that.

View solution in original post

Ryan_Chynoweth · ‎02-28-2022

Delta tables do have optimistic concurrency control. So if two processes are trying to write to the same table it does its best to handle both but if the transactions are conflicts then it will fail. You can also change the isolation levels if you want to enforce more control on that.

Hubert-Dudek · ‎03-01-2022

Exactly. You can easy verify that as commits are written to separate files in delta log.

Regarding:

.config('spark.databricks.adaptive.autoOptimizeShuffle.enabled', 'true')

and other spark optimization solutions please watch databricks video about that https://www.youtube.com/watch?v=daXEp4HmS-E

alejandrofm · ‎03-17-2022

It helped but still testing different configurations, thank you!

Anonymous · ‎04-28-2022

Hey there @Alejandro Martinez

Hope everything is going well.

Just wanted to see if you were able to find an answer to your question. If yes, would you be happy to let us know and mark it as best so that other members can find the solution more quickly?

Cheers!

Databricks Community

Are there any recommended spark config settings for Delta/Databricks?

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon