Louis_Frolio
Databricks Employee
Databricks Employee

@kanikvijay9 , Really great post. Dropping runtime from 22.4 hours to 8–12 is no small feat — that’s some serious optimization work. A few thoughts that might take it even further:

Let’s start with Adaptive Query Execution (AQE). If it’s not already in play, definitely give it a look. AQE can dynamically fine-tune shuffle partitions at runtime using actual data stats, which often saves a ton of manual trial and error.

Then there’s Column Pruning. With over two thousand columns, it’s worth analyzing which sets are most frequently queried together. If patterns emerge, you might consider splitting into a few narrower tables. That can make queries more efficient and easier to manage.

And for Databricks Runtime 13.3+, Liquid Clustering is a game-changer. It handles high-cardinality columns gracefully and removes the need for manual ZORDERing — one less maintenance headache to worry about.

Out of curiosity, which column(s) did you land on for partitioning the Delta table? That choice alone can make or break both write throughput and read performance.

Cheers, Louis.