- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-04-2025 03:46 AM
Thank you for your earlier reply—which was helpful—but I’m still looking for a clearer explanation, so I’m asking this in more detail.
I'm working with external Delta tables that don't use Predictive I/O optimization, so I set the delta.dataSkippingStatsColumns property to include only the relevant columns and ran ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS to enable data skipping for those columns. While exploring further ways to improve query performance, I found that it's recommended to run ANALYZE TABLE <table_name> COMPUTE STATISTICS periodically, especially after refreshing a significant portion of the data. I tried this on one table and noticed that it ran very quickly, returning only the row count and table size. This led me to wonder: how does this lightweight command actually help the query optimizer, how this query optimizer works and in what ways does it improve performance? Additionally, does running ANALYZE TABLE <table_name> COMPUTE STATISTICS FOR COLUMNS col1, col2, col3... (targeting the same columns used for data skipping) provide meaningful benefits over the normal version?