Databricks Community

guruv · ‎11-30-2021

HI,

i have several delta tables on Azure adls gen 2 storage account running databricks runtime 7.3. there are only write/read operation on delta tables and no update/delete.

As part of release pipeline, below commands are executed in a new notebook in workspace on a new cluster

spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite = true;')
spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.autoCompact = true;')

all my application jobs are triggered on different notebook and different cluster.

Question:

Is above autoOptimize is sufficient to have optimize on all the delta tables OR i should periodically run Optimize {tableName} for each table.
Is there way to verify if autoOptimize is working or not, since when i execute query on my delta table, it gives suggestion to run Optimize

-werners- · ‎12-03-2021

the optimize runs while writing so it is not shown in the describe .

This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...

You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)

View solution in original post

-werners- · ‎12-01-2021

the auto optimize is sufficient, unless you run into performance issues.

Then I would trigger an optimize. This will generate files of 1GB (so larger than the standard size of auto optimize). And of course the Z-Order if necessary.

The suggestion to run optimize will probably be a proposal to apply Z-ordering because you use a highly selective filter in your notebook.

Z-ordering is a very interesting optimization technique but one should check what the best ordering could be. So depending on the case this can be interesting or not.

Auto-optimize does not apply z-ordering.

https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize

guruv · ‎12-01-2021

Thanks for confirmation.

Is there way to verify autoOptimize is actually doing optimize? I

i was thinking Descripe History {tableName} will be showing some operation for autoOptimize running. But in my case all the delta tables are showing only 1 day of history (we have not set anything exlicitly) and in that there is only "Write" operation.

-werners- · ‎12-03-2021

the optimize runs while writing so it is not shown in the describe .

This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...

You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)

jose_gonzalez · ‎12-06-2021

hi @guruv ,

@Werner Stinckens is correct. Auto optimize will try to create files of 128 MB within each partition. On the other hand, explicit optimize will compress more and create files of 1 GB each (default value). You can customize the default value according to your use case.

Databricks Community

delta table autooptimize vs optimize command

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples