<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Error writing a partitioned Delta Table from a multitasking job in azure databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28219#M20042</link>
    <description>&lt;P&gt;Hi @Daniel Vera​&amp;nbsp;, Thanks for the details. I am not sure about your data, but I hope adding multiple columns in partition does not ended up with small files.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Lets see if databricks provides an option to apply explicit lock on tables for parallel operations.&lt;/P&gt;</description>
    <pubDate>Thu, 17 Feb 2022 04:12:40 GMT</pubDate>
    <dc:creator>RKNutalapati</dc:creator>
    <dc:date>2022-02-17T04:12:40Z</dc:date>
    <item>
      <title>Error writing a partitioned Delta Table from a multitasking job in azure databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28216#M20039</link>
      <description>&lt;P&gt;I have a notebook that writes a delta table with a statement similar to the following:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;match = "current.country = updates.country and current.process_date = updates.process_date"
deltaTable = DeltaTable.forPath(spark, silver_path)
deltaTable.alias("current")\
.merge(
    data.alias("updates"),
    match) \
  .whenMatchedUpdate(
      set = update_set,
      condition = condition) \
  .whenNotMatchedInsert(values = values_set)\
  .execute()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The multitask job has two tasks that are executed in parallel.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="eb3tr"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2077i5252667B039A69A3/image-size/large?v=v2&amp;amp;px=999" role="button" title="eb3tr" alt="eb3tr" /&gt;&lt;/span&gt;When executing the job the following error is displayed:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;ConcurrentAppendException: Files were added to partition [country=Panamá, process_date=2022-01-01 00:00:00] by a concurrent update. Please try the operation again.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;In each task I send different countries (Panama, Ecuador) and the same date as a parameter, so when executing only the information corresponding to the country sent should be written. This delta table is partitioned by the country and process_date fields. Any ideas what I'm doing wrong? How should I specify the partition to be affected when using the "merge" statement?&lt;/P&gt;&lt;P&gt;I appreciate if you can clarify how I should work with the partitions in these cases, since this is new to me.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Update:&lt;/B&gt;&amp;nbsp;I made an adjustment in the condition to specify the country and process date according to what is indicated&amp;nbsp;&lt;A href="https://docs.databricks.com/delta/concurrency-control.html#id1" alt="https://docs.databricks.com/delta/concurrency-control.html#id1" target="_blank"&gt;here (ConcurrentAppendException)&lt;/A&gt;. Now I get the following error message:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I can't think what could cause the error. Keep investigating.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Feb 2022 15:55:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28216#M20039</guid>
      <dc:creator>danielveraec</dc:creator>
      <dc:date>2022-02-14T15:55:08Z</dc:date>
    </item>
    <item>
      <title>Re: Error writing a partitioned Delta Table from a multitasking job in azure databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28217#M20040</link>
      <description>&lt;P&gt;I think we have to break down the DML operations into multiple tasks like below(screen shot) to make it work for parallel operations on a delta table{assuming the table is partitioned]. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2084iCA04304FE8F7FA16/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;let me know if this works&lt;/P&gt;</description>
      <pubDate>Wed, 16 Feb 2022 18:18:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28217#M20040</guid>
      <dc:creator>RKNutalapati</dc:creator>
      <dc:date>2022-02-16T18:18:42Z</dc:date>
    </item>
    <item>
      <title>Re: Error writing a partitioned Delta Table from a multitasking job in azure databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28218#M20041</link>
      <description>&lt;P&gt;Initially, the affected table only had a date field as partition. So I partitioned it with country and date fields. This new partition created the country and date directories however the old directories of the date partition remained and were not deleted.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Apparently these directories were causing the conflict when trying to read them concurrently. I created a new delta on another path with the correct partitions and then replaced it on the original path. This allowed old partition directories to be removed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The only consequence of performing these actions was that I lost the change history of the table (time travel).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Feb 2022 19:20:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28218#M20041</guid>
      <dc:creator>danielveraec</dc:creator>
      <dc:date>2022-02-16T19:20:26Z</dc:date>
    </item>
    <item>
      <title>Re: Error writing a partitioned Delta Table from a multitasking job in azure databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28219#M20042</link>
      <description>&lt;P&gt;Hi @Daniel Vera​&amp;nbsp;, Thanks for the details. I am not sure about your data, but I hope adding multiple columns in partition does not ended up with small files.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Lets see if databricks provides an option to apply explicit lock on tables for parallel operations.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Feb 2022 04:12:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-writing-a-partitioned-delta-table-from-a-multitasking-job/m-p/28219#M20042</guid>
      <dc:creator>RKNutalapati</dc:creator>
      <dc:date>2022-02-17T04:12:40Z</dc:date>
    </item>
  </channel>
</rss>

