<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Update a part of parquet partition is deleting existing data. in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/update-a-part-of-parquet-partition-is-deleting-existing-data/m-p/36710#M5405</link>
    <description>&lt;P&gt;I have 4 months of data and I partitioned it on Year and Month column, so my parquet partition looks like&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_1-1688364696457.jpeg" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2743i52A9E78FA2527694/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_1-1688364696457.jpeg" alt="JGAICT_1-1688364696457.jpeg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_0-1688364651334.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2742i3FD221571CAC03A7/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_0-1688364651334.png" alt="JGAICT_0-1688364651334.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_2-1688364739675.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2744i7A9DE20B162B9F21/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_2-1688364739675.png" alt="JGAICT_2-1688364739675.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Data is present inside each monthly partition folder in parquet format.&lt;/P&gt;&lt;P&gt;Then I loaded data for July and also modified some values in August. After generating the required data, I tried to save the output with same partition (Year,Month) as before this time the data did not have September, October and November entries but only for 2 months. The result is as&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_3-1688365438030.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2745iBB7C5F28840D46BC/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_3-1688365438030.png" alt="JGAICT_3-1688365438030.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_4-1688365625614.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2746i0DBD452482D77AA8/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_4-1688365625614.png" alt="JGAICT_4-1688365625614.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Interestingly when I look unto the month of September, the parquet files are missing, same in October and November. The folders have some modifications based on last modified.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_6-1688365838569.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2748i3CF74DE0733AB72B/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_6-1688365838569.png" alt="JGAICT_6-1688365838569.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_5-1688365699622.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2747iF1B388106EEF7D54/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_5-1688365699622.png" alt="JGAICT_5-1688365699622.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I am guessing that the files had gone missing because the 2nd write had overridden the 1st one because the partition was also on year. Is there a way to overcome this problem and avoid files to be deleted?&lt;/P&gt;</description>
    <pubDate>Mon, 03 Jul 2023 06:56:06 GMT</pubDate>
    <dc:creator>JGAICT</dc:creator>
    <dc:date>2023-07-03T06:56:06Z</dc:date>
    <item>
      <title>Update a part of parquet partition is deleting existing data.</title>
      <link>https://community.databricks.com/t5/get-started-discussions/update-a-part-of-parquet-partition-is-deleting-existing-data/m-p/36710#M5405</link>
      <description>&lt;P&gt;I have 4 months of data and I partitioned it on Year and Month column, so my parquet partition looks like&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_1-1688364696457.jpeg" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2743i52A9E78FA2527694/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_1-1688364696457.jpeg" alt="JGAICT_1-1688364696457.jpeg" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_0-1688364651334.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2742i3FD221571CAC03A7/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_0-1688364651334.png" alt="JGAICT_0-1688364651334.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_2-1688364739675.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2744i7A9DE20B162B9F21/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_2-1688364739675.png" alt="JGAICT_2-1688364739675.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Data is present inside each monthly partition folder in parquet format.&lt;/P&gt;&lt;P&gt;Then I loaded data for July and also modified some values in August. After generating the required data, I tried to save the output with same partition (Year,Month) as before this time the data did not have September, October and November entries but only for 2 months. The result is as&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_3-1688365438030.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2745iBB7C5F28840D46BC/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_3-1688365438030.png" alt="JGAICT_3-1688365438030.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_4-1688365625614.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2746i0DBD452482D77AA8/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_4-1688365625614.png" alt="JGAICT_4-1688365625614.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Interestingly when I look unto the month of September, the parquet files are missing, same in October and November. The folders have some modifications based on last modified.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_6-1688365838569.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2748i3CF74DE0733AB72B/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_6-1688365838569.png" alt="JGAICT_6-1688365838569.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="JGAICT_5-1688365699622.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2747iF1B388106EEF7D54/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400" role="button" title="JGAICT_5-1688365699622.png" alt="JGAICT_5-1688365699622.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I am guessing that the files had gone missing because the 2nd write had overridden the 1st one because the partition was also on year. Is there a way to overcome this problem and avoid files to be deleted?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jul 2023 06:56:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/update-a-part-of-parquet-partition-is-deleting-existing-data/m-p/36710#M5405</guid>
      <dc:creator>JGAICT</dc:creator>
      <dc:date>2023-07-03T06:56:06Z</dc:date>
    </item>
  </channel>
</rss>

