cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How can I run OPTIMIZE on a table if I am streaming to it 24/7?

User16826992666
Valued Contributor

I have a table that I need to be continuously streaming into. I know it's best practice to run Optimize on my tables periodically. But if I never stop writing to the table, how and when can I run OPTIMIZE against it?

1 REPLY 1

brickster_2018
Databricks Employee
Databricks Employee

If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.

However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases within the streaming, custom logic can be written to perform the optimize as part of the streaming job itself. Maybe every 100 batches perform the OPTIMIZE.

Check here for the list of operations:

https://docs.databricks.com/delta/concurrency-control.html#write-conflicts

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group