Liquid Clustering

hrishiharsh25 — Mon, 25 Nov 2024 11:57:58 GMT

How can I use column for liquid clustering that is not in first 32 column of my delta table schema.

Re: Liquid Clustering

PotnuruSiva — Mon, 25 Nov 2024 13:49:55 GMT

We can only specify columns with statistics collected for clustering keys. By default, the first 32 columns in a Delta table have statistics collected. See Specify Delta statistics columns.

We can use the below workaround for your use case:

1. Use the below table property to specify the column name that you want to use in the liquid clustering

delta.dataSkippingStatsColumns

The above property is used to specify a list of column names for which Delta Lake collects statistics. Supersedes dataSkippingNumIndexedCols.

Table properties can be set at table creation or with ALTER TABLE statements. See Delta table properties reference.

2. Then run the below query to collect the stats for the above column:

ANALYZE TABLE table_name COMPUTE DELTA STATISTICS

3. Use the column in the cluster By for liquid clustering

topic Liquid Clustering in Get Started Discussions

Liquid Clustering

Re: Liquid Clustering