topic Re: Row-level Concurrency and Liquid Clustering compatibility in Data Engineering

Row-level Concurrency and Liquid Clustering compatibility

JasonThomas — Fri, 29 Dec 2023 17:00:20 GMT

The documentation is a little ambiguous:

"Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."

https://docs.databricks.com/en/release-notes/runtime/14.2.html

Tables with liquid clustering enabled support row-level concurrency in Databricks Runtime 13.3 LTS and above. Row-level concurrency is generally available in Databricks Runtime 14.2 and above for all tables with deletion vectors enabled.

https://docs.databricks.com/en/delta/clustering.html

Also, is there a method to enable cluster-on-write for MERGE INTO statements?

Most operations do not automatically cluster data on write. Operations that cluster on write include the following:

INSERT INTO operations
CTAS statements
COPY INTO from Parquet format
spark.write.format("delta").mode("append")

Re: Row-level Concurrency and Liquid Clustering compatibility

SparkJun — Wed, 17 Jan 2024 00:10:55 GMT

It is recommanded to use the DBR 14.2 or above for its default row-level concurrency support. Since there isn't a way to just enable cluster-on-write during MERGE INTO statements. You can consider clustering the source data before merging it.

Re: Row-level Concurrency and Liquid Clustering compatibility

JasonThomas — Wed, 17 Jan 2024 00:57:12 GMT

Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.

I found the following informative:

https://www.youtube.com/watch?v=5t6wX28JC_M