cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

delta table autooptimize vs optimize command

guruv
New Contributor III

HI,

i have several delta tables on Azure adls gen 2 storage account running databricks runtime 7.3. there are only write/read operation on delta tables and no update/delete.

As part of release pipeline, below commands are executed in a new notebook in workspace on a new cluster

spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite = true;')
spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.autoCompact = true;')

all my application jobs are triggered on different notebook and different cluster.

Question:

  1. Is above autoOptimize is sufficient to have optimize on all the delta tables OR i should periodically run Optimize {tableName} for each table.
  2. Is there way to verify if autoOptimize is working or not, since when i execute query on my delta table, it gives suggestion to run Optimize
1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

the optimize runs while writing so it is not shown in the describe .

This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...

You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)

View solution in original post

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @ guruv! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

-werners-
Esteemed Contributor III

the auto optimize is sufficient, unless you run into performance issues.

Then I would trigger an optimize. This will generate files of 1GB (so larger than the standard size of auto optimize). And of course the Z-Order if necessary.

The suggestion to run optimize will probably be a proposal to apply Z-ordering because you use a highly selective filter in your notebook.

Z-ordering is a very interesting optimization technique but one should check what the best ordering could be. So depending on the case this can be interesting or not.

Auto-optimize does not apply z-ordering.

https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize

guruv
New Contributor III

Thanks for confirmation.

Is there way to verify autoOptimize is actually doing optimize? I

i was thinking Descripe History {tableName} will be showing some operation for autoOptimize running. But in my case all the delta tables are showing only 1 day of history (we have not set anything exlicitly) and in that there is only "Write" operation.

-werners-
Esteemed Contributor III

the optimize runs while writing so it is not shown in the describe .

This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...

You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)

hi @guruv​ ,

@Werner Stinckens​  is correct. Auto optimize will try to create files of 128 MB within each partition. On the other hand, explicit optimize will compress more and create files of 1 GB each (default value). You can customize the default value according to your use case.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.