โ11-27-2023 07:18 PM
Hi,
Is it possible to convert existing delta table with partition having data to clustering? If so can you please suggest the steps required? I tried and searched but couldn't find any. Is it that liquid clustering can be done only for new Delta tables? Please help
โ11-27-2023 09:57 PM - edited โ11-27-2023 10:07 PM
@Retired_mod How this can be applied to existing delta table which is partitioned having data? Can you please suggest me the steps involved? The existing delta table is partitioned and hence the location files are all partitioned. So is it possible to convert this to cluster? In existing databricks documentation, its mentioned for NEW tables.
โ11-27-2023 10:26 PM
Hi,
Sorry for again asking this. My requirement is not to change partition column for existing Delta table. My requirement is to change the existing delta table example Table A partitioned by Column 1, Column 2 to Table A cluster by Column 3.
Requirement is converting existing partitioned Delta table to Delta table with cluster with new column.
โ11-27-2023 10:48 PM
While defining a new table which uses liquid cluster, we mention at the end as 'USING DELTA CLUSTER BY (Column1)'
As per above solution Point 3 Cluster by the new column, its mentioned as
.partitionBy("colB")
How is it identified as CLUSTER? Because while creating a table we have CLUSTER BY and PARTITION BY as 2 different usage and that's how table is identified as CLUSTER or PARTITION.
As per above explanation , does the DESCRIBE table show it as cluster?
โ11-27-2023 11:36 PM
Thank you for the response!
DESCRIBE table gives the column and datatype details and also the columns of PARTITION and CLUSTER. If cluster is used it mentions as Clustering Information and mention the columns used likewise for partition also.
So back to my previous question. Is there a way to Convert existing Delta Table with partition to a Delta Table with cluster.
1. Table A -- Partition column A
2. Take back up of Table A as A_bkp
3. Replace or Drop/Create Table A with Cluster Column B
4. INSERT TABLE A AS SELECT * FROM A_BKP
5. DROP A_BKP, remove the files associated
Is this a good approach?
โ10-08-2024 05:25 AM
This is an old reply, but I want to verify the last comment by Fatma.
You create the new table using "AS SELECT * FROM A_bkp", and in the next step you write another "INSERT INTO Table_A SELECT ร FROM A_bkp". Is this just a typo or why is it inserting data from the backup table twice?
โ04-08-2024 05:13 AM
Does Liquid Clustering accepts Merge or How Upsert can be done efficiently with Liquid clustered delta table
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group