How to identify which columns we need to consider for liquid clustering from a table of 200+ columns

TejeshS
Contributor II

In Databricks, when working with a table that has a large number of columns (e.g., 200), it can be challenging to determine which columns are most important for liquid clustering.

Objective: The goal is to determine which columns to select based on their ability to meaningfully contribute to the clustering process, thereby improving query performance and insights.