cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to add the partition for an existing delta table

hari
Contributor

We didn't need to set partitions for our delta tables as we didn't have many performance concerns and delta lake out-of-the-box optimization worked great for us. But there is now a need to set a specific partition column for some tables to allow concurrent delta merges into the partitions.

We are using unmanaged tables with the data sitting in s3

What is the best way to add/update partition columns on an existing delta table?

I have tried the `ALTER TABLE log ADD PARTITION(date = DATE'2021-09-10');` but it didn't work also this doesn't add partition for all values of date

Also tried rewriting the table and setting partition column with:

(
df.write.format("delta")
.mode("overwrite")
.option("overwriteSchema", "true")
 .partitionBy(<Col Name>)
 .saveAsTable(<Table Name>)
)

But I don't see the partition name when I check the table with `DESCRIBE TABLE`, So not sure if this is the proper way to approach this.

Another option is to recreate the tables as i do see that we can set partition columns while creating a table, But don't really want to do this except maybe as a last resort.

5 REPLIES 5

Kaniz_Fatma
Community Manager
Community Manager

Hi @Harikrishnan P H​,

Delta Lake lets you update the schema of a table. The following types of changes are supported:

  • Adding new columns (at arbitrary positions)
  • Reordering existing columns
  • Renaming existing columns

You can make these changes explicitly using DDL or implicitly using DML.

Important:-

When you update a Delta table schema, streams that read from that table terminate. If you want the stream to continue, you must restart it.

For recommended methods, see Production considerations for Structured Streaming applications on Databricks.

Explicitly update schema

You can use the following DDL to explicitly change the schema of a table.

To Add columns,

ALTER TABLE table_name ADD COLUMNS (col_name data_type [COMMENT col_comment] [FIRST|AFTER colA_name], ...)

By default, nullability is true.

To add a column to a nested field, use the following:

ALTER TABLE table_name ADD COLUMNS (col_name.nested_col_name data_type [COMMENT col_comment] [FIRST|AFTER colA_name], ...)

Example:-

If the schema before running ALTER TABLE boxes ADD COLUMNS (colB.nested STRING AFTER field1) is:

- root

| - colA

| - colB

| +-field1

| +-field2

The schema after is:

- root

| - colA

| - colB

| +-field1

| +-nested

| +-field2

Note:-

Adding nested columns is supported only for structs. Arrays and maps are not supported.

Rest you can find it here.

Source:- https://docs.databricks.com/delta/delta-batch.html#add-columns

---------------------------------------------------------------------

" I have tried the `ALTER TABLE ADD PARTITION,` but it didn't work. "

Would you mind sharing the code snippet here?

hari
Contributor

Updated the description

Hubert-Dudek
Esteemed Contributor III

Just read it and save it partitioned under the same name. But please back up first!

Hi @Hubert Dudek​ ,

Thanks for the reply.

So, the only way is to read the entire data repartition and then write, correct?

Will this add a new partition if a new value for the partition key comes

Kaniz_Fatma
Community Manager
Community Manager

Hi @Harikrishnan P H​ , We haven’t heard from you since the last response from @Hubert Dudek​ , and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group