cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Remove partition column from delta table

AndyG
New Contributor II

I have delta tables with multiple partition columns. I want to remove most of the partition columns and retain just one. I can see there are ALTER TABLE...PARTITION options but these are not supported for delta lake tables. So is there a way to do this - or do I need to recreate the tables.

6 REPLIES 6

szymon_dybczak
Contributor

Hi @AndyG ,

Maybe try the way official delta guid is suggesting:

Adding and Deleting Partitions in Delta Lake tables | Delta Lake

You can delete all rows from a given partition to remove the partition from the Delta table.

Hereโ€™s how to delete all the rows with individuals from Argentina.

 Copy
dt = delta.DeltaTable.forName(spark, "country_people")

dt.delete(F.col("country") == "Argentina")

Letโ€™s run the vacuum twice and observe how the Argentina partition is deleted from the filesystem.

 Copy
spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")

spark.sql("VACUUM country_people RETAIN 0 HOURS")
spark.sql("VACUUM country_people RETAIN 0 HOURS")

NOTE: Weโ€™re only setting the retention period to 0 hours in this example to demonstrate disk structure changes. The retention period should normally be at least 7 days. A retention period of 0 hours is dangerous because it can break concurrent write operations and time travel.

View the contents of the filesystem and make sure that the Argentina partition was removed.

 Copy
spark-warehouse/country_people
โ”œโ”€โ”€ _delta_log
โ”‚   โ”œโ”€โ”€ 00000000000000000000.json
โ”‚   โ”œโ”€โ”€ 00000000000000000001.json
โ”‚   โ””โ”€โ”€ 00000000000000000002.json
โ”œโ”€โ”€ country=China
โ”‚   โ””โ”€โ”€ part-00000-9a8d67fa-c23d-41a4-b570-a45405f9ad78.c000.snappy.parquet
โ”œโ”€โ”€ country=Colombia
โ”‚   โ””โ”€โ”€ part-00000-7e3d3d49-39e9-4eb2-ab92-22a485291f91.c000.snappy.parquet
โ””โ”€โ”€ country=Russia
    โ””โ”€โ”€ part-00000-c49ca623-ea69-4088-8d85-c7c2de30cc28.c000.snappy.parquet

You need to run vacuum twice to completely remove the Argentina partition. The first vacuum run deletes the files with Argentina data, and the Argentina directory becomes empty. The second vacuum run deletes the empty Argentina directory. You donโ€™t normally have to run vacuum twice for all changes to take effect, but this is a special edge case. See this blog post to learn more about the vacuum command.

 

 

I'm not looking to delete individual partitions, but change the way the tables are actually partitioned. They use multiple columns as partitions - I want to remove most of the columns and retain just one column. So the tables are partitioned using one column only.

Hi Slash,

I`ve seen mention that one way to do it (and a recommended by databricks way) is to use REPLACE TABLE.  I`ve personally tried that in the past but it drops any auto incremement columns on the replaced table which is a problem for us.

Hi @AndyG , @AndySkinner 

Yeah, I misunderstood the question. I would do this in following way:

 

REPLACE TABLE <tablename>
  USING DELTA
  PARTITIONED BY (column_name)
AS
 SELECT * FROM <tablename>

 

Best practices: Delta Lake | Databricks on AWS 

Hey @szymon_dybczak ,

This removes the auto increment column on the new table that gets created, which is a big problem

Andy

Yep, but I don't think there is a way do it without messing up with auto increment. But maybe someone share some idea...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group