Databricks Community

ShivangiB · ‎03-05-2025

If i already have a cluster key1 for existing table, i want to change cluster key to key2 using

ALTER TABLE table CLUSTER BY (key2), then run OPTIMIZE table, based on databrick document , existing files will not be rewritten (verified by my test as well), cluster Key2 will only be applied incrementally. If my existing files are huge (saying having long history), using Key2 will not improve performance when i search the table (e.g. select * from table where key2='xxxx'). is liquid cluster supposed to work this way or i am missing anything?

koji_kawamura · ‎03-11-2025

Hi @ShivangiB Yes, Liquid cluster is supposed to work that way. The behavior that only applies new keys to the new data supports the characteristic of "Tables with access patterns that change over time". It allows us to change clustering keys based on query pattern changes without rewriting the historical data.

If the historical data needs to use the new clustering keys, we can use OPTIMIZE FULL, which requires I/O costs and time.

lingareddy_Alva · ‎03-11-2025

@ShivangiB

You're correct in your understanding. When you change a clustering key using ALTER TABLE followed by OPTIMIZE, it doesn't automatically recluster existing data. Let me explain why this happens and what options you have.In Delta Lake (which Databricks uses), the ALTER TABLE CLUSTER BY statement only changes the clustering specification for future write operations. The OPTIMIZE command without additional parameters will only compact small files but won't reorder data according to the new clustering key.For your existing data to benefit from the new clustering key (key2), you need to explicitly request reclustering. You have two options:

Run OPTIMIZE with ZORDER BY: OPTIMIZE table ZORDER BY (key2)
Or rewrite the table entirely:

CREATE OR REPLACE TABLE table_temp CLUSTER BY (key2) AS SELECT * FROM table; -- Then rename tables

ALTER TABLE table RENAME TO table_old; ALTER TABLE table_temp RENAME TO table;

The reason your test showed that existing files weren't rewritten is because OPTIMIZE alone primarily focuses on file compaction rather than data reorganization. For large historical datasets, you need to explicitly trigger the reclustering process to see performance improvements for queries filtering on key2.

LR

Databricks Community

Liquid Clustering Key Change Question

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST