How to add the partition for an existing delta table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-14-2022 06:07 AM
We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow concurrent delta merges into the partitions.
We are using unmanaged tables with the data sitting in s3
What is the best way to add/update partition columns on an existing delta table?
I have tried the `ALTER TABLE log ADD PARTITION(date = DATE'2021-09-10');` but it didn't work also this doesn't add partition for all values of date
Also tried rewriting the table and setting partition column with:
(
df.write.format("delta")
.mode("overwrite")
.option("overwriteSchema", "true")
.partitionBy(<Col Name>)
.saveAsTable(<Table Name>)
)
But I don't see the partition name when I check the table with `DESCRIBE TABLE`, So not sure if this is the proper way to approach this.
Another option is to recreate the tables as i do see that we can set partition columns while creating a table, But don't really want to do this except maybe as a last resort.
- Labels:
-
Delta
-
Delta Tables
-
Partition Column
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2022 04:50 AM
Updated the description
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-20-2022 05:24 AM
Just read it and save it partitioned under the same name. But please back up first!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-20-2022 10:12 PM
Hi @Hubert Dudek ,
Thanks for the reply.
So, the only way is to read the entire data repartition and then write, correct?
Will this add a new partition if a new value for the partition key comes

