cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

delta table autooptimize vs optimize command

guruv
New Contributor III

HI,

i have several delta tables on Azure adls gen 2 storage account running databricks runtime 7.3. there are only write/read operation on delta tables and no update/delete.

As part of release pipeline, below commands are executed in a new notebook in workspace on a new cluster

spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite = true;')
spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.autoCompact = true;')

all my application jobs are triggered on different notebook and different cluster.

Question:

  1. Is above autoOptimize is sufficient to have optimize on all the delta tables OR i should periodically run Optimize {tableName} for each table.
  2. Is there way to verify if autoOptimize is working or not, since when i execute query on my delta table, it gives suggestion to run Optimize
1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

the optimize runs while writing so it is not shown in the describe .

This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...

You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

the auto optimize is sufficient, unless you run into performance issues.

Then I would trigger an optimize. This will generate files of 1GB (so larger than the standard size of auto optimize). And of course the Z-Order if necessary.

The suggestion to run optimize will probably be a proposal to apply Z-ordering because you use a highly selective filter in your notebook.

Z-ordering is a very interesting optimization technique but one should check what the best ordering could be. So depending on the case this can be interesting or not.

Auto-optimize does not apply z-ordering.

https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize

guruv
New Contributor III

Thanks for confirmation.

Is there way to verify autoOptimize is actually doing optimize? I

i was thinking Descripe History {tableName} will be showing some operation for autoOptimize running. But in my case all the delta tables are showing only 1 day of history (we have not set anything exlicitly) and in that there is only "Write" operation.

-werners-
Esteemed Contributor III

the optimize runs while writing so it is not shown in the describe .

This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...

You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)

jose_gonzalez
Databricks Employee
Databricks Employee

hi @guruv​ ,

@Werner Stinckens​  is correct. Auto optimize will try to create files of 128 MB within each partition. On the other hand, explicit optimize will compress more and create files of 1 GB each (default value). You can customize the default value according to your use case.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group