cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

From Partitioning to Liquid Clustering

Volker
New Contributor III

We had some delta tables that where previously partitioned on year, month, day, and hour. This resulted in quite small partitions and we now switched to liquid clustering.

We followed these steps:

  • Remove partitioning by doing REPLACE
  • ALTER TABLE --- CLUSTER BY 
  • Run OPTIMIZE --- FULL

I see in the query output that some files have been written and some have been removed but in the underlying s3 bucket I still see the parquet files in the old hive-style partition layout.

Are these old files that will be removed by some VACUUM job or what does OPTIMIZE do if the data in the root bucket is stored in the hive-style partition layout even though we removed the partitioning from the delta table.

Also we are now using the processing_dttm as cluster key instead of year, month, day, hour. The processing_dttm column contains th dttm like so: 2024-11-19T09:30:00.765+00:00. 
Would it be better to only include year, month and day or maybe hour instead of minutes or seconds? Or is liquid clustering smart enough the infere this from the dttm?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group