Databricks Community

JasonThomas · ‎12-29-2023

The documentation is a little ambiguous:

"Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."

https://docs.databricks.com/en/release-notes/runtime/14.2.html

Tables with liquid clustering enabled support row-level concurrency in Databricks Runtime 13.3 LTS and above. Row-level concurrency is generally available in Databricks Runtime 14.2 and above for all tables with deletion vectors enabled.

https://docs.databricks.com/en/delta/clustering.html

Also, is there a method to enable cluster-on-write for MERGE INTO statements?

Most operations do not automatically cluster data on write. Operations that cluster on write include the following:

INSERT INTO operations
CTAS statements
COPY INTO from Parquet format
spark.write.format("delta").mode("append")

SparkJun · ‎01-16-2024

It is recommanded to use the DBR 14.2 or above for its default row-level concurrency support. Since there isn't a way to just enable cluster-on-write during MERGE INTO statements. You can consider clustering the source data before merging it.

JasonThomas · ‎01-16-2024

Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.

I found the following informative:

https://www.youtube.com/watch?v=5t6wX28JC_M