Liquid Clustering is an innovative data management technique that significantly simplifies your data layout-related decisions. You only have to choose clustering keys based on query access patterns. Thousands of customers have benefited from better query performance with Liquid Clustering, and we now have 3000+ active monthly customers writing 200+ PB data to Liquid-clustered tables per month.
If you are still using partitioning to manage multiple writers, you are missing out on a key feature of Liquid Clustering: row-level concurrency.
In this blog post, we’ll explain how Databricks delivers out-of-the-box concurrency guarantees for customers with concurrent modifications on their tables. Row-level concurrency lets you focus on extracting business insights by eliminating the need to design complex data layouts or coordinate workloads, simplifying your code and data pipelines.
Row-level concurrency is automatically enabled when you use Liquid Clustering. It is also enabled with deletion vectors when using Databricks Runtime 14.2+. If you have concurrent modifications that frequently fail with ConcurrentAppendException
or ConcurrentUpdateException
, enable Liquid Clustering or deletion vectors on your table today to have row-level conflict detection and reduce conflicts. Getting started is simple:
Read on for a deep dive into how row-level concurrency automatically handles concurrent writes modifying the same file.