cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Liquid Clustering

hrishiharsh25
New Contributor

How can I use column for liquid clustering that is not in first 32 column of my delta table schema.

1 REPLY 1

PotnuruSiva
Databricks Employee
Databricks Employee

We can only specify columns with statistics collected for clustering keys. By default, the first 32 columns in a Delta table have statistics collected. See Specify Delta statistics columns.

We can use the below workaround for your use case:

1. Use the below table property to specify the column name that you want to use in the liquid clustering

delta.dataSkippingStatsColumns

The above property is used to specify a list of column names for which Delta Lake collects statistics. Supersedes dataSkippingNumIndexedCols.

Table properties can be set at table creation or with ALTER TABLE statements. See Delta table properties reference.

2. Then run the below query to collect the stats for the above column:

ANALYZE TABLE table_name COMPUTE DELTA STATISTICS

 3. Use the column in the cluster By for liquid clustering

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now