Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
The ANALYZE Command specifically captures statistics which are relevant for the Cost Based Optimizer to make better decisions.
The 32 columns of statistics that Delta auto-collects are specifically for data skipping. This is separate from the ANALYZE command
The reason docs currently say Do not run on Delta tablesโ is because Its best to run Analyze on Delta tables after completion of any data update/delete operation and when the data has changed by around 10%. This gives the CBO the best and most up-to-date statistics to work with
General best practices:
ANALYZE whenever the data has changed by about 10%
Make sure when you use ANALYZE, you are specifying the COLUMNS or PARTITIONS you want to collect statistics for. Otherwise, as you have noted, it will re-analyze the entire table
The ANALYZE Command specifically captures statistics which are relevant for the Cost Based Optimizer to make better decisions.
The 32 columns of statistics that Delta auto-collects are specifically for data skipping. This is separate from the ANALYZE command
The reason docs currently say Do not run on Delta tablesโ is because Its best to run Analyze on Delta tables after completion of any data update/delete operation and when the data has changed by around 10%. This gives the CBO the best and most up-to-date statistics to work with
General best practices:
ANALYZE whenever the data has changed by about 10%
Make sure when you use ANALYZE, you are specifying the COLUMNS or PARTITIONS you want to collect statistics for. Otherwise, as you have noted, it will re-analyze the entire table
Super write-up; very useful in understanding how the Delta and non-Delta approaches have evolved.
Connect with Databricks Users in Your Area
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.