<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Row-level Concurrency and Liquid Clustering compatibility in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/row-level-concurrency-and-liquid-clustering-compatibility/m-p/57527#M30814</link>
    <description>&lt;P&gt;Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.&lt;BR /&gt;&lt;BR /&gt;I found the following informative:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://www.youtube.com/watch?v=5t6wX28JC_M" target="_blank"&gt;https://www.youtube.com/watch?v=5t6wX28JC_M&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 17 Jan 2024 00:57:12 GMT</pubDate>
    <dc:creator>JasonThomas</dc:creator>
    <dc:date>2024-01-17T00:57:12Z</dc:date>
    <item>
      <title>Row-level Concurrency and Liquid Clustering compatibility</title>
      <link>https://community.databricks.com/t5/data-engineering/row-level-concurrency-and-liquid-clustering-compatibility/m-p/55930#M30466</link>
      <description>&lt;P&gt;The documentation is a little ambiguous:&lt;BR /&gt;&lt;BR /&gt;"&lt;SPAN&gt;Row-level concurrency is only supported on tables without partitioning, which includes tables with liquid clustering."&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/release-notes/runtime/14.2.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/release-notes/runtime/14.2.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN&gt;Tables with liquid clustering enabled support row-level concurrency in Databricks Runtime 13.3 LTS and above. Row-level concurrency is generally available in Databricks Runtime 14.2 and above for all tables with deletion vectors enabled.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://docs.databricks.com/en/delta/clustering.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/delta/clustering.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Also, is there a method to enable cluster-on-write for MERGE INTO statements?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Most operations do not automatically cluster data on write. Operations that cluster on write include the following:&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P&gt;&lt;SPAN class=""&gt;INSERT&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;INTO&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;operations&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;SPAN class=""&gt;CTAS&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;statements&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;SPAN class=""&gt;COPY&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;INTO&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;from Parquet format&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;SPAN class=""&gt;spark.write.format("delta").mode("append")&lt;/SPAN&gt;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Fri, 29 Dec 2023 17:00:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/row-level-concurrency-and-liquid-clustering-compatibility/m-p/55930#M30466</guid>
      <dc:creator>JasonThomas</dc:creator>
      <dc:date>2023-12-29T17:00:20Z</dc:date>
    </item>
    <item>
      <title>Re: Row-level Concurrency and Liquid Clustering compatibility</title>
      <link>https://community.databricks.com/t5/data-engineering/row-level-concurrency-and-liquid-clustering-compatibility/m-p/57525#M30813</link>
      <description>&lt;P&gt;&lt;SPAN&gt;It is recommanded to use the DBR 14.&lt;/SPAN&gt;&lt;SPAN&gt;2 or above for its default row-level concurrency support. Since there isn't a way to just&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;enable cluster-on-write during&amp;nbsp;&lt;/SPAN&gt;&lt;CODE class=""&gt;MERGE INTO&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;statements. You can consider clustering&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;the source data before merging it.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jan 2024 00:10:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/row-level-concurrency-and-liquid-clustering-compatibility/m-p/57525#M30813</guid>
      <dc:creator>SparkJun</dc:creator>
      <dc:date>2024-01-17T00:10:55Z</dc:date>
    </item>
    <item>
      <title>Re: Row-level Concurrency and Liquid Clustering compatibility</title>
      <link>https://community.databricks.com/t5/data-engineering/row-level-concurrency-and-liquid-clustering-compatibility/m-p/57527#M30814</link>
      <description>&lt;P&gt;Cluster-on-write is something being worked on. The limitations at the moment have to do with accommodating streaming workloads.&lt;BR /&gt;&lt;BR /&gt;I found the following informative:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://www.youtube.com/watch?v=5t6wX28JC_M" target="_blank"&gt;https://www.youtube.com/watch?v=5t6wX28JC_M&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jan 2024 00:57:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/row-level-concurrency-and-liquid-clustering-compatibility/m-p/57527#M30814</guid>
      <dc:creator>JasonThomas</dc:creator>
      <dc:date>2024-01-17T00:57:12Z</dc:date>
    </item>
  </channel>
</rss>

