10-14-2022 06:07 AM
We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow concurrent delta merges into the partitions.
We are using unmanaged tables with the data sitting in s3
What is the best way to add/update partition columns on an existing delta table?
I have tried the `ALTER TABLE log ADD PARTITION(date = DATE'2021-09-10');` but it didn't work also this doesn't add partition for all values of date
Also tried rewriting the table and setting partition column with:
(
df.write.format("delta")
.mode("overwrite")
.option("overwriteSchema", "true")
.partitionBy(<Col Name>)
.saveAsTable(<Table Name>)
)
But I don't see the partition name when I check the table with `DESCRIBE TABLE`, So not sure if this is the proper way to approach this.
Another option is to recreate the tables as i do see that we can set partition columns while creating a table, But don't really want to do this except maybe as a last resort.
10-18-2022 04:43 AM
Hi @Harikrishnan P H,
Delta Lake lets you update the schema of a table. The following types of changes are supported:
You can make these changes explicitly using DDL or implicitly using DML.
Important:-
When you update a Delta table schema, streams that read from that table terminate. If you want the stream to continue, you must restart it.
For recommended methods, see Production considerations for Structured Streaming applications on Databricks.
Explicitly update schema
You can use the following DDL to explicitly change the schema of a table.
To Add columns,
ALTER TABLE table_name ADD COLUMNS (col_name data_type [COMMENT col_comment] [FIRST|AFTER colA_name], ...)
By default, nullability is true.
To add a column to a nested field, use the following:
ALTER TABLE table_name ADD COLUMNS (col_name.nested_col_name data_type [COMMENT col_comment] [FIRST|AFTER colA_name], ...)
Example:-
If the schema before running ALTER TABLE boxes ADD COLUMNS (colB.nested STRING AFTER field1) is:
- root
| - colA
| - colB
| +-field1
| +-field2
The schema after is:
- root
| - colA
| - colB
| +-field1
| +-nested
| +-field2
Note:-
Adding nested columns is supported only for structs. Arrays and maps are not supported.
Rest you can find it here.
Source:- https://docs.databricks.com/delta/delta-batch.html#add-columns
---------------------------------------------------------------------
" I have tried the `ALTER TABLE ADD PARTITION,` but it didn't work. "
Would you mind sharing the code snippet here?
10-18-2022 04:50 AM
Updated the description
10-20-2022 05:24 AM
Just read it and save it partitioned under the same name. But please back up first!
10-20-2022 10:12 PM
Hi @Hubert Dudek ,
Thanks for the reply.
So, the only way is to read the entire data repartition and then write, correct?
Will this add a new partition if a new value for the partition key comes
10-20-2022 08:09 AM
Hi @Harikrishnan P H , We haven’t heard from you since the last response from @Hubert Dudek , and I was checking back to see if you have a resolution yet.
If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group