cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I run OPTIMIZE on a table if I am streaming to it 24/7?

User16826992666
Valued Contributor

I have a table that I need to be continuously streaming into. I know it's best practice to run Optimize on my tables periodically. But if I never stop writing to the table, how and when can I run OPTIMIZE against it?

1 REPLY 1

User16869510359
Esteemed Contributor

If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.

However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases within the streaming, custom logic can be written to perform the optimize as part of the streaming job itself. Maybe every 100 batches perform the OPTIMIZE.

Check here for the list of operations:

https://docs.databricks.com/delta/concurrency-control.html#write-conflicts

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.