cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I use data skipping with Delta Lake

Srikanth_Gupta_
Valued Contributor

How does data skipping work with delta lake, can I run ANALYZE TABLE COMPUTE STATISTICS with Delta lake? or Zorder going to solve these problems?

2 REPLIES 2

sajith_appukutt
Honored Contributor II

You do not need to configure data skipping for delta lake, it would be used whenever applicable.

The effectiveness of data skipping would depend on the layout and you could apply Z-Ordering for best results.

Anonymous
New Contributor III

You can use Zorder with indexes for data skipping. Data skipping information is collected automatically when you write to delta table. 
Delta lake uses this information to provide faster query.

You dont need to configure anything for data skipping as this feature is activated when applicable. However, the effectiveness depends on the layout of the data. By default Delta Lake collects statistics on the first 32 columns (which can be changed using the property the delta.dataSkippingNumIndexedCols Adding more columns would add more overhead as you write files.

Collecting statistics on long strings is an expensive operation. We should avoid that by not collecting statistics on long strings. You can either configure the table property delta.dataSkippingNumIndexedCols  to avoid such columns or move such columns containing to a column greater than delta.dataSkippingNumIndexedCols

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.