Liquid Cluster enabled table - concurrent writes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 12:53 PM
I am trying to insert rows into a Liquid cluster enabled delta table using multiple threads. This link, states that liquid clustering is used for : Tables with concurrent write requirements.
I get this error: [DELTA_CONCURRENT_APPEND] ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.
How do I insert records in parallel?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2025 12:58 PM
To address the ConcurrentAppendException
error when inserting records in parallel into a Liquid cluster enabled delta table, you can consider the following approaches:
-
Isolation Levels and Write Conflicts:
-
Ensure that the isolation level of your table is set appropriately. The default isolation level is
WriteSerializable
, which ensures that write operations are serializable but allows for some concurrency. If you need stricter isolation, you can set the isolation level toSerializable
using the following command:ALTER TABLE <table_name> SET TBLPROPERTIES ('delta.isolationLevel' = 'Serializable');
-
Be aware that stricter isolation levels may reduce concurrency and increase the likelihood of conflicts.
-
-
Row-Level Concurrency:
- Ensure that row-level concurrency is enabled. This feature is generally available in Databricks Runtime 14.2 and above and helps reduce conflicts between concurrent write operations by detecting changes at the row level.
- Row-level concurrency is supported for tables with deletion vectors enabled and without partitioning. Ensure that your table meets these conditions.
-
Avoiding Conflicts:
-
To avoid conflicts, you can make the separation explicit in the operation condition. For example, if your table is partitioned by date and country, you can use the following merge operation:
deltaTable.as("t").merge( source.as("s"), "s.user_id = t.user_id AND s.date = t.date AND s.country = t.country" ).whenMatched().updateAll() .whenNotMatched().insertAll() .execute()
-
This ensures that the operations are disjoint and do not conflict with each other.
-
-
Sequential Execution:
- If possible, avoid running the final pivot task in parallel. Instead, run it once all the parallel staging tasks are finished. This approach can help prevent conflicts that arise from concurrent writes.
-
Optimize Command:
- Regularly run the
OPTIMIZE
command to ensure that data is efficiently clustered. For tables experiencing many updates or inserts, schedule anOPTIMIZE
job every one or two hours.
- Regularly run the
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-07-2025 01:24 AM
We encountered a similar issue as well, and the workaround we tried was partitioning those columns, as Liquid clustering can sometimes trigger this error.

