Databricks Community

Sainath368 · ‎06-13-2025

Hi all,

I have a question- how can we modify delta.dataSkippingStatsColumns and compute statistics for a partitioned delta table in Databricks? I want to understand the process and best practices for changing this setting and ensuring accurate statistical computations for partitioned data. Any guidance would be appreciated.

paolajara · ‎06-18-2025

Hi, delta.dataSkippingStatsColumns specifies a coma-separated list of column names used by Delta Lake to collect statistics. It will improve the performance by skipping those columns since it will supersede the default behavior of analyzing the first 32 columns of the table.

Set property when creating or modifying a table:
1. ALTER TABLE your_table_name
  SET TBLPROPERTIES('delta.dataSkippingStatsColumns' = 'col1, col2, col3');
Databricks Runtime (DBR) 14.3 LTS and above, you can manually compute statistics for existing and future data:
1. ANALYZE TABLE your_table_name COMPUTE DELTA STATISTICS;
Avoid using Long String Columns since they expensive to analyze. These should be included in delta.dataSkippingStatsColumns
it would be nice for the columns specified for statistics overlap with filtering criteria i.e. partitioning column or high cardinality columns. Statistics on unused or less-filtered columns could waste compute resources without significant benefits.

Databricks Community

Data Skipping- Partitioned tables

Join Us as a Local Community Builder!

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐