AutoOptimize, OPTIMIZE command and Vacuum command ...

AP · ‎07-31-2022

So databricks gives us great toolkit in the form optimization and vacuum. But, in terms of operationaling them, I am really confused on the best practice.

Should we enable "optimized writes" by setting the following at a workspace level?

spark.conf.set("spark.databricks.delta.optimizeWrite.enabled", "true") # for writing speed

spark.conf.set("spark.databricks.delta.autoCompact.enabled", "true") # compressing files

OR

Should we explicitly execute OPTIMIZE command on tables and databases at a set frequency. Also, if we enable Optimized writes at a workspace level, should we separately have to execute OPTIMIZE again at a table level. Are they same or different?

After the decision around OPTIMIZE is settled, when should we run VACUUM. Should we run both OPTIMIZE and vacuum in the same script? If not, what should be the ideal order

AutoOptimize, OPTIMIZE command and Vacuum command : Order, production implementation best practices