cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the upper bound limit for dataSkippingNumIndexedCols, to keeps stats in delta log file?

chhavibansal
New Contributor II

Is there an upper bound of number that i can assign to delta.dataSkippingNumIndexedCols for computing statistics. Is there some tradeoff benchmark available for increasing this number beyond 32.

1 REPLY 1

Anonymous
Not applicable

@Chhavi Bansal​ :

The delta.dataSkippingNumIndexedCols configuration property controls the maximum number of columns that Delta Lake will build statistics on during data skipping. By default, this value is set to 32. There is no hard upper bound on the number that can be assigned to this configuration property, but setting it to a very large number can have a negative impact on performance and memory usage. The optimal value for this configuration property will depend on the characteristics of your data and the workload that you are running. Delta Lake documentation recommends setting delta.dataSkippingNumIndexedCols to be equal to or slightly larger than the number of columns that you expect to be commonly used in predicates for filtering data. You can also adjust this value based on the size of your data and the resources available to your cluster.

As for the tradeoff benchmark, I am not aware of any specific benchmark related to this configuration property. However, you can monitor the performance and memory usage of your Delta Lake workload with different values of this configuration property to determine the optimal value for your specific use case.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!