05-06-2025 04:13 AM - edited 05-06-2025 04:16 AM
I am doing a merge in a table in parallel via 2 jobs.
The table is a liquid clustered table with the following properties:
delta.enableChangeDataFeed=true
delta.enableDeletionVectors=true
delta.enableRowTracking=true
delta.feature.changeDataFeed=supported
delta.feature.clustering=supported
delta.feature.deletionVectors=supported
delta.feature.domainMetadata=supported
delta.feature.rowTracking=supported
delta.isolationLevel=Serializable
Given that row-level tracking, deletion vectors, and Liquid Clustering are enabled, I expected that concurrent write conflicts would be avoided, especially since the two jobs are modifying non-overlapping rows.However, I’m still encountering concurrent modification errors (e.g., concurrent delete exceptions).Is there something I’m missing, or is this the expected behavior under certain conditions?
05-06-2025 04:35 AM
Hey @flashmav , keep in mind that operations in Delta Lake often occur at the file level rather than the row level. For example, if two sessions attempt to update data in the same file (even if they’re not updating the same row), you may encounter a race condition, resulting in one session throwing an error. It’s important to remember that Delta Lake is not designed for OLTP (Online Transaction Processing) scenarios; it’s optimized for analytics use cases. The ACID transactions supported by Delta are limited in scope. With this context, here are some suggestions to consider:
MERGE
operations on a liquid clustered table with row-level tracking and deletion vectors enabled are expected behavior under certain circumstances.MERGE
involve both reads and writes that can lead to conflicts if file modification timestamps or metadata tracking indicates overlapping changes.delta.enableRowTracking=true
and delta.enableDeletionVectors=true
) improve conflict management but do not completely eliminate the possibility of conflicts.MERGE
or DELETE
commands) can still lead to exceptions, such as ConcurrentDeleteReadException
or ConcurrentDeleteDeleteException
.delta.isolationLevel=Serializable
). While this ensures strict serial execution semantics for transactions, it increases the likelihood of conflict detection when concurrent jobs attempt simultaneous write operations.MERGE
operations to include explicit predicates that clearly denote non-overlapping data regions in the target table. For example, use additional filters based on distinct partitions or ranges to limit potential overlaps.OPTIMIZE
operations if your table is undergoing heavy ingestion or transactional churn. This reduces the potential for multiple transactions performing concurrent rewrites on the same file.delta.isolationLevel=WriteSerializable
) to relax the conflict detection rigor if the strict serializability is not a hard requirement. Note, however, that this trade-off allows certain operations to reorder in history.ConcurrentDeleteReadException
, ConcurrentTransactionException
) to fine-tune job parameters and logic further.05-06-2025 04:35 AM
Hey @flashmav , keep in mind that operations in Delta Lake often occur at the file level rather than the row level. For example, if two sessions attempt to update data in the same file (even if they’re not updating the same row), you may encounter a race condition, resulting in one session throwing an error. It’s important to remember that Delta Lake is not designed for OLTP (Online Transaction Processing) scenarios; it’s optimized for analytics use cases. The ACID transactions supported by Delta are limited in scope. With this context, here are some suggestions to consider:
MERGE
operations on a liquid clustered table with row-level tracking and deletion vectors enabled are expected behavior under certain circumstances.MERGE
involve both reads and writes that can lead to conflicts if file modification timestamps or metadata tracking indicates overlapping changes.delta.enableRowTracking=true
and delta.enableDeletionVectors=true
) improve conflict management but do not completely eliminate the possibility of conflicts.MERGE
or DELETE
commands) can still lead to exceptions, such as ConcurrentDeleteReadException
or ConcurrentDeleteDeleteException
.delta.isolationLevel=Serializable
). While this ensures strict serial execution semantics for transactions, it increases the likelihood of conflict detection when concurrent jobs attempt simultaneous write operations.MERGE
operations to include explicit predicates that clearly denote non-overlapping data regions in the target table. For example, use additional filters based on distinct partitions or ranges to limit potential overlaps.OPTIMIZE
operations if your table is undergoing heavy ingestion or transactional churn. This reduces the potential for multiple transactions performing concurrent rewrites on the same file.delta.isolationLevel=WriteSerializable
) to relax the conflict detection rigor if the strict serializability is not a hard requirement. Note, however, that this trade-off allows certain operations to reorder in history.ConcurrentDeleteReadException
, ConcurrentTransactionException
) to fine-tune job parameters and logic further.Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now