11-30-2021 10:03 PM
HI,
i have several delta tables on Azure adls gen 2 storage account running databricks runtime 7.3. there are only write/read operation on delta tables and no update/delete.
As part of release pipeline, below commands are executed in a new notebook in workspace on a new cluster
spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite = true;')
spark.sql('set spark.databricks.delta.properties.defaults.autoOptimize.autoCompact = true;')
all my application jobs are triggered on different notebook and different cluster.
Question:
12-03-2021 01:24 AM
the optimize runs while writing so it is not shown in the describe .
This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...
You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)
12-01-2021 02:13 AM
the auto optimize is sufficient, unless you run into performance issues.
Then I would trigger an optimize. This will generate files of 1GB (so larger than the standard size of auto optimize). And of course the Z-Order if necessary.
The suggestion to run optimize will probably be a proposal to apply Z-ordering because you use a highly selective filter in your notebook.
Z-ordering is a very interesting optimization technique but one should check what the best ordering could be. So depending on the case this can be interesting or not.
Auto-optimize does not apply z-ordering.
https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize
12-01-2021 01:32 PM
Thanks for confirmation.
Is there way to verify autoOptimize is actually doing optimize? I
i was thinking Descripe History {tableName} will be showing some operation for autoOptimize running. But in my case all the delta tables are showing only 1 day of history (we have not set anything exlicitly) and in that there is only "Write" operation.
12-03-2021 01:24 AM
the optimize runs while writing so it is not shown in the describe .
This has a cost of slower writes (but faster reads afterwards). There is always a cost to be paid...
You can check the file size of the current files. They should be more or less the same size (128MB or 32MB are the defaults depending on the version)
12-06-2021 04:40 PM
hi @guruv ,
@Werner Stinckens is correct. Auto optimize will try to create files of 128 MB within each partition. On the other hand, explicit optimize will compress more and create files of 1 GB each (default value). You can customize the default value according to your use case.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group