cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Row-level Concurrency and Liquid Clustering compatibility

JasonThomas
New Contributor III

The documentation is a little ambiguous:

"Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."

https://docs.databricks.com/en/release-notes/runtime/14.2.html

 Tables with liquid clustering enabled support row-level concurrency in Databricks Runtime 13.3 LTS and above. Row-level concurrency is generally available in Databricks Runtime 14.2 and above for all tables with deletion vectors enabled.

https://docs.databricks.com/en/delta/clustering.html

Also, is there a method to enable cluster-on-write for MERGE INTO statements?

Most operations do not automatically cluster data on write. Operations that cluster on write include the following:

  • INSERT INTO operations

  • CTAS statements

  • COPY INTO from Parquet format

  • spark.write.format("delta").mode("append")

2 REPLIES 2

JunYang
New Contributor III
New Contributor III

It is recommanded to use the DBR 14.2 or above for its default row-level concurrency support. Since there isn't a way to just enable cluster-on-write during MERGE INTO statements. You can consider clustering the source data before merging it. 

JasonThomas
New Contributor III

Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.

I found the following informative:

https://www.youtube.com/watch?v=5t6wX28JC_M

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.