<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Liquid Cluster enabled table - concurrent writes in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/liquid-cluster-enabled-table-concurrent-writes/m-p/104470#M41761</link>
    <description>&lt;P&gt;We encountered a similar issue as well, and the workaround we tried was partitioning those columns, as Liquid clustering can sometimes trigger this error.&lt;/P&gt;</description>
    <pubDate>Tue, 07 Jan 2025 09:24:44 GMT</pubDate>
    <dc:creator>TejeshS</dc:creator>
    <dc:date>2025-01-07T09:24:44Z</dc:date>
    <item>
      <title>Liquid Cluster enabled table - concurrent writes</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-cluster-enabled-table-concurrent-writes/m-p/104432#M41743</link>
      <description>&lt;P data-unlink="true"&gt;I am trying to insert rows into a Liquid cluster enabled delta table using multiple threads. This &lt;A title="Liquid Cluster" href="https://docs.databricks.com/en/delta/clustering.html" target="_self"&gt;link&lt;/A&gt;, states that liquid clustering is used for :&amp;nbsp;&lt;SPAN&gt;Tables with concurrent write requirements.&lt;/SPAN&gt;&lt;/P&gt;&lt;P data-unlink="true"&gt;I get this error:&amp;nbsp;&lt;SPAN&gt;[DELTA_CONCURRENT_APPEND] ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.&lt;/SPAN&gt;&lt;/P&gt;&lt;P data-unlink="true"&gt;How do I insert records in parallel?&lt;/P&gt;&lt;P data-unlink="true"&gt;Thanks,&lt;/P&gt;&lt;P data-unlink="true"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jan 2025 20:53:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-cluster-enabled-table-concurrent-writes/m-p/104432#M41743</guid>
      <dc:creator>raghu2</dc:creator>
      <dc:date>2025-01-06T20:53:47Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid Cluster enabled table - concurrent writes</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-cluster-enabled-table-concurrent-writes/m-p/104433#M41744</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;To address the &lt;CODE&gt;ConcurrentAppendException&lt;/CODE&gt; error when inserting records in parallel into a Liquid cluster enabled delta table, you can consider the following approaches:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Isolation Levels and Write Conflicts&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;Ensure that the isolation level of your table is set appropriately. The default isolation level is &lt;CODE&gt;WriteSerializable&lt;/CODE&gt;, which ensures that write operations are serializable but allows for some concurrency. If you need stricter isolation, you can set the isolation level to &lt;CODE&gt;Serializable&lt;/CODE&gt; using the following command:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-sql _1t7bu9hb hljs language-sql gb5fhw3"&gt;&lt;SPAN class="hljs-keyword"&gt;ALTER&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;TABLE&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;&amp;lt;&lt;/SPAN&gt;table_name&lt;SPAN class="hljs-operator"&gt;&amp;gt;&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;SET&lt;/SPAN&gt; TBLPROPERTIES (&lt;SPAN class="hljs-string"&gt;'delta.isolationLevel'&lt;/SPAN&gt; &lt;SPAN class="hljs-operator"&gt;=&lt;/SPAN&gt; &lt;SPAN class="hljs-string"&gt;'Serializable'&lt;/SPAN&gt;);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;Be aware that stricter isolation levels may reduce concurrency and increase the likelihood of conflicts.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Row-Level Concurrency&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;Ensure that row-level concurrency is enabled. This feature is generally available in Databricks Runtime 14.2 and above and helps reduce conflicts between concurrent write operations by detecting changes at the row level.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;Row-level concurrency is supported for tables with deletion vectors enabled and without partitioning. Ensure that your table meets these conditions.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Avoiding Conflicts&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;To avoid conflicts, you can make the separation explicit in the operation condition. For example, if your table is partitioned by date and country, you can use the following merge operation:&lt;/SPAN&gt;&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-scala _1t7bu9hb hljs language-python gb5fhw3"&gt;deltaTable.&lt;SPAN class="hljs-keyword"&gt;as&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"t"&lt;/SPAN&gt;).merge(
  source.&lt;SPAN class="hljs-keyword"&gt;as&lt;/SPAN&gt;(&lt;SPAN class="hljs-string"&gt;"s"&lt;/SPAN&gt;),
  &lt;SPAN class="hljs-string"&gt;"s.user_id = t.user_id AND s.date = t.date AND s.country = t.country"&lt;/SPAN&gt;
).whenMatched().updateAll()
 .whenNotMatched().insertAll()
 .execute()&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;This ensures that the operations are disjoint and do not conflict with each other.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;STRONG&gt;Sequential Execution&lt;/STRONG&gt;:&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;If possible, avoid running the final pivot task in parallel. Instead, run it once all the parallel staging tasks are finished. This approach can help prevent conflicts that arise from concurrent writes.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;&lt;SPAN&gt;&lt;STRONG&gt;Optimize Command&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL class="_1t7bu9h7 _1t7bu9h2"&gt;
&lt;LI&gt;&lt;SPAN&gt;Regularly run the &lt;CODE&gt;OPTIMIZE&lt;/CODE&gt; command to ensure that data is efficiently clustered. For tables experiencing many updates or inserts, schedule an &lt;CODE&gt;OPTIMIZE&lt;/CODE&gt; job every one or two hours.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Mon, 06 Jan 2025 20:58:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-cluster-enabled-table-concurrent-writes/m-p/104433#M41744</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-06T20:58:59Z</dc:date>
    </item>
    <item>
      <title>Re: Liquid Cluster enabled table - concurrent writes</title>
      <link>https://community.databricks.com/t5/data-engineering/liquid-cluster-enabled-table-concurrent-writes/m-p/104470#M41761</link>
      <description>&lt;P&gt;We encountered a similar issue as well, and the workaround we tried was partitioning those columns, as Liquid clustering can sometimes trigger this error.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jan 2025 09:24:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/liquid-cluster-enabled-table-concurrent-writes/m-p/104470#M41761</guid>
      <dc:creator>TejeshS</dc:creator>
      <dc:date>2025-01-07T09:24:44Z</dc:date>
    </item>
  </channel>
</rss>

