Databricks Community

techuser · ‎11-27-2023

Hi,

Is it possible to convert existing delta table with partition having data to clustering? If so can you please suggest the steps required? I tried and searched but couldn't find any. Is it that liquid clustering can be done only for new Delta tables? Please help

techuser · ‎11-27-2023

@Retired_mod How this can be applied to existing delta table which is partitioned having data? Can you please suggest me the steps involved? The existing delta table is partitioned and hence the location files are all partitioned. So is it possible to convert this to cluster? In existing databricks documentation, its mentioned for NEW tables.

techuser · ‎11-27-2023

Hi,

Sorry for again asking this. My requirement is not to change partition column for existing Delta table. My requirement is to change the existing delta table example Table A partitioned by Column 1, Column 2 to Table A cluster by Column 3.

Requirement is converting existing partitioned Delta table to Delta table with cluster with new column.

techuser · ‎11-27-2023

While defining a new table which uses liquid cluster, we mention at the end as 'USING DELTA CLUSTER BY (Column1)'

As per above solution Point 3 Cluster by the new column, its mentioned as

.partitionBy("colB")

How is it identified as CLUSTER? Because while creating a table we have CLUSTER BY and PARTITION BY as 2 different usage and that's how table is identified as CLUSTER or PARTITION.

As per above explanation , does the DESCRIBE table show it as cluster?

techuser · ‎11-27-2023

Thank you for the response!

DESCRIBE table gives the column and datatype details and also the columns of PARTITION and CLUSTER. If cluster is used it mentions as Clustering Information and mention the columns used likewise for partition also.

So back to my previous question. Is there a way to Convert existing Delta Table with partition to a Delta Table with cluster.

1. Table A -- Partition column A

2. Take back up of Table A as A_bkp

3. Replace or Drop/Create Table A with Cluster Column B

4. INSERT TABLE A AS SELECT * FROM A_BKP

5. DROP A_BKP, remove the files associated

Is this a good approach?

mwessman · ‎10-08-2024

This is an old reply, but I want to verify the last comment by Fatma.
You create the new table using "AS SELECT * FROM A_bkp", and in the next step you write another "INSERT INTO Table_A SELECT Ä FROM A_bkp". Is this just a typo or why is it inserting data from the backup table twice?

Raja_Databricks · ‎04-08-2024

Does Liquid Clustering accepts Merge or How Upsert can be done efficiently with Liquid clustered delta table

Databricks Community

Databricks Liquid Cluster

Connect with Databricks Users in Your Area

Securely share data, analytics and AI

Data Intelligence for Data Engineers

Databricks Learning Festival (Virtual): 15 January - 31 January 2025