From Partitioning to Liquid Clustering
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thursday
We had some delta tables that where previously partitioned on year, month, day, and hour. This resulted in quite small partitions and we now switched to liquid clustering.
We followed these steps:
- Remove partitioning by doing REPLACE
- ALTER TABLE --- CLUSTER BY
- Run OPTIMIZE --- FULL
I see in the query output that some files have been written and some have been removed but in the underlying s3 bucket I still see the parquet files in the old hive-style partition layout.
Are these old files that will be removed by some VACUUM job or what does OPTIMIZE do if the data in the root bucket is stored in the hive-style partition layout even though we removed the partitioning from the delta table.
Also we are now using the processing_dttm as cluster key instead of year, month, day, hour. The processing_dttm column contains th dttm like so: 2024-11-19T09:30:00.765+00:00.
Would it be better to only include year, month and day or maybe hour instead of minutes or seconds? Or is liquid clustering smart enough the infere this from the dttm?

