<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Remove partition column from delta table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88672#M37568</link>
    <description>&lt;P&gt;I have delta tables with multiple partition columns. I want to remove most of the partition columns and retain just one. I can see there are ALTER TABLE...PARTITION options but these are not supported for delta lake tables. So is there a way to do this - or do I need to recreate the tables.&lt;/P&gt;</description>
    <pubDate>Thu, 05 Sep 2024 08:16:27 GMT</pubDate>
    <dc:creator>AndyG</dc:creator>
    <dc:date>2024-09-05T08:16:27Z</dc:date>
    <item>
      <title>Remove partition column from delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88672#M37568</link>
      <description>&lt;P&gt;I have delta tables with multiple partition columns. I want to remove most of the partition columns and retain just one. I can see there are ALTER TABLE...PARTITION options but these are not supported for delta lake tables. So is there a way to do this - or do I need to recreate the tables.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2024 08:16:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88672#M37568</guid>
      <dc:creator>AndyG</dc:creator>
      <dc:date>2024-09-05T08:16:27Z</dc:date>
    </item>
    <item>
      <title>Re: Remove partition column from delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88673#M37569</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119220"&gt;@AndyG&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Maybe try the way official delta guid is suggesting:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://delta.io/blog/2023-01-18-add-remove-partition-delta-lake/#:~:text=Remove%20Partition%20from%20Delta%20Lake,partition%20from%20the%20Delta%20table." target="_blank" rel="noopener"&gt;Adding and Deleting Partitions in Delta Lake tables | Delta Lake&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P class=""&gt;You can delete all rows from a given partition to remove the partition from the Delta table.&lt;/P&gt;&lt;P class=""&gt;Here’s how to delete all the rows with individuals from Argentina.&lt;/P&gt;&lt;DIV class=""&gt;&lt;A class="" href="https://delta.io/blog/2023-01-18-add-remove-partition-delta-lake/#" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Copy&lt;/A&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;dt&lt;/SPAN&gt; = &lt;SPAN class=""&gt;delta&lt;/SPAN&gt;&lt;SPAN class=""&gt;.DeltaTable&lt;/SPAN&gt;&lt;SPAN class=""&gt;.forName&lt;/SPAN&gt;(spark, &lt;SPAN class=""&gt;"country_people"&lt;/SPAN&gt;)

&lt;SPAN class=""&gt;dt&lt;/SPAN&gt;&lt;SPAN class=""&gt;.delete&lt;/SPAN&gt;(F.&lt;SPAN class=""&gt;col&lt;/SPAN&gt;(&lt;SPAN class=""&gt;"country"&lt;/SPAN&gt;) == &lt;SPAN class=""&gt;"Argentina"&lt;/SPAN&gt;)&lt;/PRE&gt;&lt;/DIV&gt;&lt;P class=""&gt;Let’s run the vacuum twice and observe how the Argentina partition is deleted from the filesystem.&lt;/P&gt;&lt;DIV class=""&gt;&lt;A class="" href="https://delta.io/blog/2023-01-18-add-remove-partition-delta-lake/#" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Copy&lt;/A&gt;&lt;PRE&gt;spark.conf.&lt;SPAN class=""&gt;set&lt;/SPAN&gt;(&lt;SPAN class=""&gt;"spark.databricks.delta.retentionDurationCheck.enabled"&lt;/SPAN&gt;, &lt;SPAN class=""&gt;"false"&lt;/SPAN&gt;)

spark.sql(&lt;SPAN class=""&gt;"VACUUM country_people RETAIN 0 HOURS"&lt;/SPAN&gt;)
spark.sql(&lt;SPAN class=""&gt;"VACUUM country_people RETAIN 0 HOURS"&lt;/SPAN&gt;)&lt;/PRE&gt;&lt;/DIV&gt;&lt;P class=""&gt;NOTE: We’re only setting the retention period to 0 hours in this example to demonstrate disk structure changes. The retention period should normally be at least 7 days. A retention period of 0 hours is dangerous because it can break concurrent write operations and time travel.&lt;/P&gt;&lt;P class=""&gt;View the contents of the filesystem and make sure that the Argentina partition was removed.&lt;/P&gt;&lt;DIV class=""&gt;&lt;A class="" href="https://delta.io/blog/2023-01-18-add-remove-partition-delta-lake/#" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Copy&lt;/A&gt;&lt;PRE&gt;spark-warehouse/country_people
├── _delta_log
│   ├── 00000000000000000000.json
│   ├── 00000000000000000001.json
│   └── 00000000000000000002.json
├── &lt;SPAN class=""&gt;country&lt;/SPAN&gt;=China
│   └── part-00000-9a8d67fa-c23d-41a4-b570-a45405f9ad78.c000.snappy.parquet
├── &lt;SPAN class=""&gt;country&lt;/SPAN&gt;=Colombia
│   └── part-00000-7e3d3d49-39e9-4eb2-ab92-22a485291f91.c000.snappy.parquet
└── &lt;SPAN class=""&gt;country&lt;/SPAN&gt;=Russia
    └── part-00000-c49ca623-ea69-4088-8d85-c7c2de30cc28.c000.snappy.parquet&lt;/PRE&gt;&lt;/DIV&gt;&lt;P class=""&gt;You need to run vacuum twice to completely remove the Argentina partition. The first vacuum run deletes the files with Argentina data, and the Argentina directory becomes empty. The second vacuum run deletes the empty Argentina directory. You don’t normally have to run vacuum twice for all changes to take effect, but this is a special edge case. See&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="" href="https://delta.io/blog/2023-01-03-delta-lake-vacuum-command/" target="_blank" rel="noopener"&gt;this blog post&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;to learn more about the vacuum command.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2024 08:21:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88673#M37569</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-05T08:21:03Z</dc:date>
    </item>
    <item>
      <title>Re: Remove partition column from delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88678#M37571</link>
      <description>&lt;P&gt;I'm not looking to delete individual partitions, but change the way the tables are actually partitioned. They use multiple columns as partitions - I want to remove most of the columns and retain just one column. So the tables are partitioned using one column only.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2024 08:27:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88678#M37571</guid>
      <dc:creator>AndyG</dc:creator>
      <dc:date>2024-09-05T08:27:28Z</dc:date>
    </item>
    <item>
      <title>Re: Remove partition column from delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88681#M37573</link>
      <description>&lt;P&gt;Hi Slash,&lt;/P&gt;&lt;P&gt;I`ve seen mention that one way to do it (and a recommended by databricks way) is to use REPLACE TABLE.&amp;nbsp; I`ve personally tried that in the past but it drops any auto incremement columns on the replaced table which is a problem for us.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2024 08:41:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88681#M37573</guid>
      <dc:creator>AndySkinner</dc:creator>
      <dc:date>2024-09-05T08:41:43Z</dc:date>
    </item>
    <item>
      <title>Re: Remove partition column from delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88682#M37574</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119220"&gt;@AndyG&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/119371"&gt;@AndySkinner&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yeah, I misunderstood the question. I would do this in following way:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;REPLACE TABLE &amp;lt;tablename&amp;gt;
  USING DELTA
  PARTITIONED BY (column_name)
AS
 SELECT * FROM &amp;lt;tablename&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/delta/best-practices.html#replace-the-content-or-schema-of-a-table" target="_blank"&gt;Best practices: Delta Lake | Databricks on AWS&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2024 08:43:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88682#M37574</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-05T08:43:44Z</dc:date>
    </item>
    <item>
      <title>Re: Remove partition column from delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88684#M37575</link>
      <description>&lt;P&gt;Hey &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;This removes the auto increment column on the new table that gets created, which is a big problem&lt;/P&gt;&lt;P&gt;Andy&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2024 08:55:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88684#M37575</guid>
      <dc:creator>AndySkinner</dc:creator>
      <dc:date>2024-09-05T08:55:35Z</dc:date>
    </item>
    <item>
      <title>Re: Remove partition column from delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88686#M37577</link>
      <description>&lt;P&gt;Yep, but I don't think there is a way do it without messing up with auto increment. But maybe someone share some idea...&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2024 09:34:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/remove-partition-column-from-delta-table/m-p/88686#M37577</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2024-09-05T09:34:19Z</dc:date>
    </item>
  </channel>
</rss>

