cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Liquid Clustering

hrishiharsh25
New Contributor

How can I use column for liquid clustering that is not in first 32 column of my delta table schema.

1 REPLY 1

PotnuruSiva
Databricks Employee
Databricks Employee

We can only specify columns with statistics collected for clustering keys. By default, the first 32 columns in a Delta table have statistics collected. See Specify Delta statistics columns.

We can use the below workaround for your use case:

1. Use the below table property to specify the column name that you want to use in the liquid clustering

delta.dataSkippingStatsColumns

The above property is used to specify a list of column names for which Delta Lake collects statistics. Supersedes dataSkippingNumIndexedCols.

Table properties can be set at table creation or with ALTER TABLE statements. See Delta table properties reference.

2. Then run the below query to collect the stats for the above column:

ANALYZE TABLE table_name COMPUTE DELTA STATISTICS

 3. Use the column in the cluster By for liquid clustering

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group