cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

OPTIMIZE in parallel with actual data load

noorbasha534
Valued Contributor II

Dear all

If I understand correctly, OPTIMIZE cannot run in parallel with actual data load. We see 'concurrent update' errors in our environment if this happens; due to which we are unable to dedicate a maintenance window for the tables health.

And, I saw a presentation from DAIS 2025 that says liquid clustering can run in parallel with actual data load.

Please correct the understanding here.

Appreciate the mindshare...

 

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @noorbasha534 ,

Yes, Liquid Clustering optimization can be executed on delta tables automatically or manually, at write time with Auto Compaction enabled or at any time using OPTIMIZE command, respectively.

Liquid Clustering - The Internals of Delta Lake

Additionally, it is mentioned in below blog post. Look for clustering on-write:

Announcing General Availability of Liquid Clustering | Databricks Blog

View solution in original post

5 REPLIES 5

MariuszK
Valued Contributor III

Liquid clustering reorganizes data incrementally, which will work faster because it optimizes only new data. Compared to Z-order there is a different algorithm for data organization (Hilbert curve) that alows incremental.

noorbasha534
Valued Contributor II

@MariuszK  this does not answer my question. Can I run OPTIMIZE in parallel with the data load of a liquid clustered table?

szymon_dybczak
Esteemed Contributor III

Hi @noorbasha534 ,

Yes, Liquid Clustering optimization can be executed on delta tables automatically or manually, at write time with Auto Compaction enabled or at any time using OPTIMIZE command, respectively.

Liquid Clustering - The Internals of Delta Lake

Additionally, it is mentioned in below blog post. Look for clustering on-write:

Announcing General Availability of Liquid Clustering | Databricks Blog

MariuszK
Valued Contributor III

@noorbasha534, This is a good question to clarify this topic. According to documentation, yes, but honestly speaking, I haven't had a chance to check it in the described scenario.

noorbasha534
Valued Contributor II

@MariuszK @szymon_dybczak thanks both. appreciate your support.