How to identify which columns we need to consider ...

TejeshS · ‎01-03-2025

In Databricks, when working with a table that has a large number of columns (e.g., 200), it can be challenging to determine which columns are most important for liquid clustering.

Objective: The goal is to determine which columns to select based on their ability to meaningfully contribute to the clustering process, thereby improving query performance and insights.

How to identify which columns we need to consider for liquid clustering from a table of 200+ columns