topic Re: How can I use data skipping with Delta Lake in Data Engineering

How can I use data skipping with Delta Lake

Srikanth_Gupta_ — Thu, 10 Jun 2021 04:12:01 GMT

How does data skipping work with delta lake, can I run ANALYZE TABLE COMPUTE STATISTICS with Delta lake? or Zorder going to solve these problems?

Re: How can I use data skipping with Delta Lake

sajith_appukutt — Fri, 18 Jun 2021 00:00:38 GMT

You do not need to configure data skipping for delta lake, it would be used whenever applicable.

The effectiveness of data skipping would depend on the layout and you could apply Z-Ordering for best results.

Re: How can I use data skipping with Delta Lake

Anonymous — Wed, 28 Jun 2023 02:52:31 GMT

You can use Zorder with indexes for data skipping. Data skipping information is collected automatically when you write to delta table.
Delta lake uses this information to provide faster query.

You dont need to configure anything for data skipping as this feature is activated when applicable. However, the effectiveness depends on the layout of the data. By default Delta Lake collects statistics on the first 32 columns (which can be changed using the property the delta.dataSkippingNumIndexedCols Adding more columns would add more overhead as you write files.

Collecting statistics on long strings is an expensive operation. We should avoid that by not collecting statistics on long strings. You can either configure the table property delta.dataSkippingNumIndexedCols to avoid such columns or move such columns containing to a column greater than delta.dataSkippingNumIndexedCols