Row-level Concurrency and Liquid Clustering compatibility
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-29-2023 09:00 AM
The documentation is a little ambiguous:
"Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."
https://docs.databricks.com/en/release-notes/runtime/14.2.html
Tables with liquid clustering enabled support row-level concurrency in Databricks Runtime 13.3 LTS and above. Row-level concurrency is generally available in Databricks Runtime 14.2 and above for all tables with deletion vectors enabled.
https://docs.databricks.com/en/delta/clustering.html
Also, is there a method to enable cluster-on-write for MERGE INTO statements?
Most operations do not automatically cluster data on write. Operations that cluster on write include the following:
INSERT INTO operations
CTAS statements
COPY INTO from Parquet format
spark.write.format("delta").mode("append")
- Labels:
-
Delta Lake
-
Spark
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2024 04:10 PM
It is recommanded to use the DBR 14.2 or above for its default row-level concurrency support. Since there isn't a way to just enable cluster-on-write during MERGE INTO
statements. You can consider clustering the source data before merging it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2024 04:57 PM
Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.
I found the following informative:
https://www.youtube.com/watch?v=5t6wX28JC_M

