We can only specify columns with statistics collected for clustering keys. By default, the first 32 columns in a Delta table have statistics collected. See Specify Delta statistics columns.
We can use the below workaround for your use case:
1. Use the below table property to specify the column name that you want to use in the liquid clustering
delta.dataSkippingStatsColumns
The above property is used to specify a list of column names for which Delta Lake collects statistics. Supersedes dataSkippingNumIndexedCols
.
Table properties can be set at table creation or with ALTER TABLE
statements. See Delta table properties reference.
2. Then run the below query to collect the stats for the above column:
ANALYZE TABLE table_name COMPUTE DELTA STATISTICS
3. Use the column in the cluster By for liquid clustering