cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is Concurrent Writes from multiple databricks clusters to same delta table on S3 Supported?

ptambe
New Contributor III

Does databricks have support for writing to same Delta Table from multiple clusters concurrently. I am specifically interested to know if there is any solution for https://github.com/delta-io/delta/issues/41 implemented in databricks OR if you have any recommendations on achieving - concurrent writes to same delta table on S3.

1 ACCEPTED SOLUTION

Accepted Solutions

dennyglee
Databricks Employee
Databricks Employee

Please note, the issue noted above [Storage System] Support for AWS S3 (multiple clusters/drivers/JVMs) is for Delta Lake OSS. As noted in this issue as well as Issue 324, as of this writing, S3 lacks putIfAbsent transactional consistency. For Delta Lake OSS, the community is working on PR 339 to resolve this issue.

Saying this, your question is specific to Databricks' implementation of Delta which allows for multiple clusters to concurrently write to the same Delta table using the S3 commit service. The pertinent quote is:

Databricks runs a commit service that coordinates writes to Amazon S3 from multiple clusters. This service runs in the Databricks control plane

For more information, please refer to Configure Databricks S3 commit service-related settings

View solution in original post

6 REPLIES 6

Hubert-Dudek
Esteemed Contributor III

Usually yes. It depends on partitioning. If you have 2 executors (writers) and every of them hold some partition which have to be append to delta, write process will be per partition simultaneously. You can also analyze you exact use case looking to jobs (and other tabs) in Spark UI.

ptambe
New Contributor III

Yes, with same cluster and multiple executors it works and we use replaceWhere to overwrite separate partitions. Will the same thing work if the partitions are being written to from different job clusters. The issue that I mentioned above indicates that it is not supported by delta.

dennyglee
Databricks Employee
Databricks Employee

Please note, the issue noted above [Storage System] Support for AWS S3 (multiple clusters/drivers/JVMs) is for Delta Lake OSS. As noted in this issue as well as Issue 324, as of this writing, S3 lacks putIfAbsent transactional consistency. For Delta Lake OSS, the community is working on PR 339 to resolve this issue.

Saying this, your question is specific to Databricks' implementation of Delta which allows for multiple clusters to concurrently write to the same Delta table using the S3 commit service. The pertinent quote is:

Databricks runs a commit service that coordinates writes to Amazon S3 from multiple clusters. This service runs in the Databricks control plane

For more information, please refer to Configure Databricks S3 commit service-related settings

ptambe
New Contributor III

Thanks @Denny Lee​ !!

This is what I was looking for, and I assume this configurations is enabled by default.

dennyglee
Databricks Employee
Databricks Employee

Glad to help @Prashant Tambe​  - yes, this configuration is on by default. HTH!

prem14f
New Contributor II

Hi @dennyglee ,
If I am writing data into a Delta table using delta-rs and a Databricks job, but I lose some transactions, how can I handle this?

Given that Databricks runs a commit service and delta-rs uses DynamoDB for transaction logs, how can we handle concurrent writers from Databricks jobs and delta-rs writers on the same table?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group