How can I use data skipping with Delta Lake
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2021 09:12 PM
How does data skipping work with delta lake, can I run ANALYZE TABLE COMPUTE STATISTICS with Delta lake? or Zorder going to solve these problems?
- Labels:
-
DataSkipping
-
Delta
-
Z-ordering
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2021 05:00 PM
You do not need to configure data skipping for delta lake, it would be used whenever applicable.
The effectiveness of data skipping would depend on the layout and you could apply Z-Ordering for best results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-27-2023 07:52 PM
You can use Zorder with indexes for data skipping. Data skipping information is collected automatically when you write to delta table.
Delta lake uses this information to provide faster query.
You dont need to configure anything for data skipping as this feature is activated when applicable. However, the effectiveness depends on the layout of the data. By default Delta Lake collects statistics on the first 32 columns (which can be changed using the property the delta.dataSkippingNumIndexedCols Adding more columns would add more overhead as you write files.
Collecting statistics on long strings is an expensive operation. We should avoid that by not collecting statistics on long strings. You can either configure the table property delta.dataSkippingNumIndexedCols to avoid such columns or move such columns containing to a column greater than delta.dataSkippingNumIndexedCols
data:image/s3,"s3://crabby-images/cb5bb/cb5bb73aed1093bf2bbc88d029c5de02e8c5cfc3" alt=""
data:image/s3,"s3://crabby-images/cb5bb/cb5bb73aed1093bf2bbc88d029c5de02e8c5cfc3" alt=""