Handling Concurrent Writes to a Delta Table by delta-rs and Databricks Spark Job
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-30-2024 03:06 AM
Hi @dennyglee, @Retired_mod.
If I am writing data into a Delta table using delta-rs and a Databricks job, but I lose some transactions, how can I handle this?
Given that Databricks runs a commit service and delta-rs uses DynamoDB for transaction logs, how can we handle concurrent writers from Databricks jobs and delta-rs writers on the same table?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-07-2024 06:39 AM - edited 08-07-2024 06:39 AM
Hi @prem14f, To manage lost transactions, implement retry logic with automatic retries and ensure idempotent writes to avoid duplication. For concurrent writers, use optimistic concurrency control, which allows for conflict detection and resolution during commits, partition your Delta table to reduce conflict likelihood, and ensure proper configuration and access to transaction logs. An example implementation in Databricks involves retrying writes with a delay if failures occur. Additionally, set up monitoring, alerts, and conflict resolution strategies to address issues promptly.
Is there a specific part of this process you’d like to dive deeper into?

