cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Lake S3 multi-cluster writes - DynamoDB

JonLaRose
New Contributor III

Hi there!

I'm trying to figure out how the multi-writers architecture for Delta Lake tables is implemented under the hood.

I understand that a DynamoDB table is used to provide mutual exclusion, but the question is: where is the table located? Is it in the control plane compute or the user's?

If it's in the data plane, how can I provide permissions to create/update this specific table?

If it's in the control plane compute, why is it failing with the following: 

Py4JJavaError: An error occurred while calling o476.save.
: com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: The security token included in the request is invalid.

?

Thanks!

4 REPLIES 4

Kaniz_Fatma
Community Manager
Community Manager

Hi @JonLaRose , The multi-writers architecture for Delta Lake tables uses the S3 commit service.
- The S3 commit service ensures write consistency across multiple clusters on a single table.
- The service is part of the control plane and does not read any data from S3, only puts a new file if it doesn't exist.
- The DynamoDB table is not directly referenced in the sources, but the S3 commit service is located in the control plane.
- The S3 commit service is used to implement ACID transactions and ensure consistency.
- The error Py4JJavaError: An error occurred while calling o476.save. : com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: The security token included in the request is invalid is likely due to invalid or expired AWS credentials.
- The S3 commit service uses temporary AWS credentials from the data plane, valid for six hours.
- If these credentials are invalid or expired, this error would occur.


- To fix this, ensure AWS credentials are valid and not expired.
- If using IAM roles, ensure necessary permissions are granted for the operations.

Thank you @Kaniz_Fatma .

Does the S3 Commit service use the `s3a` configured S3 endpoint (from the Spark session Hadoop configurations)? If not, is there a way to configure the S3 endpoint that the S3 Commit service uses? 

Hi @JonLaRose, The S3 Commit service is a Databricks service that helps guarantee consistency of writes across multiple clusters on a single table in specific cases. It runs in the Databricks control plane and coordinates writes to Amazon S3 from multiple clusters.

 

Regarding your question, the S3 commit service sends temporary AWS credentials from the compute plane to the control plane in the commit service API call. The compute plane writes data directly to S3, and then the S3 commit service in the control plane provides concurrency control by finalizing the commit log upload. The commit service does not read any data from S3. It puts a new file in S3 if it does not exist.

 

To access s3a:// files from Apache Sparkโ„ข, you must pass some configurations in spark-submit and specify the endpoint. You can find more information on configuring Databricks S3 commit service-related settings in the Databricks documentation page I found for you. I hope this helps!

prem14f
New Contributor II

Hi, could you please help me here? How can i use this configuration in DataBricks? 
So I will maintain my transcription logs there, and with Parallel, I can use the Delta-RS job.

spark.conf.set("spark.delta.logStore.s3a.impl", "io.delta.storage.S3DynamoDBLogStore")
spark.conf.set("spark.io.delta.storage.S3DynamoDBLogStore.ddb.tableName", "delta_log")
spark.conf.set("spark.io.delta.storage.S3DynamoDBLogStore.ddb.region", "eu-west-1")

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group