Liquid Clustering
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2024 03:57 AM
How can I use column for liquid clustering that is not in first 32 column of my delta table schema.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2024 05:49 AM
We can only specify columns with statistics collected for clustering keys. By default, the first 32 columns in a Delta table have statistics collected. See Specify Delta statistics columns.
We can use the below workaround for your use case:
1. Use the below table property to specify the column name that you want to use in the liquid clustering
delta.dataSkippingStatsColumns
The above property is used to specify a list of column names for which Delta Lake collects statistics. Supersedes dataSkippingNumIndexedCols
.
Table properties can be set at table creation or with ALTER TABLE
statements. See Delta table properties reference.
2. Then run the below query to collect the stats for the above column:
ANALYZE TABLE table_name COMPUTE DELTA STATISTICS
3. Use the column in the cluster By for liquid clustering

