Databricks Community

User16826992666 · ‎06-25-2021

I have a table that I need to be continuously streaming into. I know it's best practice to run Optimize on my tables periodically. But if I never stop writing to the table, how and when can I run OPTIMIZE against it?

brickster_2018 · ‎06-25-2021

If the streaming job is making bling appends to the delta table, then it's perfectly fine to run OPTIMIZE query in parallel.

However, if the streaming job is performing MERGE or UPDATE then it can conflict with the OPTIMIZE operations. In such cases within the streaming, custom logic can be written to perform the optimize as part of the streaming job itself. Maybe every 100 batches perform the OPTIMIZE.

Check here for the list of operations:

https://docs.databricks.com/delta/concurrency-control.html#write-conflicts