<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic You can use Low Shuffle Merge to optimize the Merge process in Delta lake in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/you-can-use-low-shuffle-merge-to-optimize-the-merge-process-in/m-p/95065#M307</link>
    <description>&lt;P&gt;Low Shuffle Merge in Databricks is a feature that optimizes the way data is merged when using Delta Lake, reducing the amount of data shuffled between nodes.&lt;/P&gt;&lt;P&gt;- Traditional merges can involve heavy data shuffling, as data is redistributed across the cluster to ensure correct merging.&lt;/P&gt;&lt;P&gt;- With Low Shuffle Merge, only a subset of data is shuffled, improving performance and reducing the cost of the merge operations.&lt;/P&gt;&lt;P&gt;Below are the benefits of Low Shuffle Merge:&lt;/P&gt;&lt;P&gt;1. Faster Execution: Reduces the amount of data shuffled, leading to faster merge operations.&lt;/P&gt;&lt;P&gt;2. Cost Efficiency: Lower shuffle operations mean less resource consumption (CPU, memory), reducing overall cloud costs.&lt;/P&gt;&lt;P&gt;3. Scalability: Improves the performance of merges on large datasets, enabling better scalability.&lt;/P&gt;&lt;P&gt;4. Better Cluster Utilization: Reduces network traffic and improves resource utilization on the cluster.&lt;/P&gt;&lt;P&gt;This feature is particularly useful in large-scale data processing scenarios where frequent merges are necessary, such as updating or deleting records in Delta tables.&lt;/P&gt;&lt;P&gt;You need to set the below for enabling this configuration&lt;BR /&gt;spark.databricks.delta.merge.enableLowShuffle = true&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/optimizations/low-shuffle-merge.html" target="_blank"&gt;https://docs.databricks.com/en/optimizations/low-shuffle-merge.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 20 Oct 2024 08:23:55 GMT</pubDate>
    <dc:creator>Sourav-Kundu</dc:creator>
    <dc:date>2024-10-20T08:23:55Z</dc:date>
    <item>
      <title>You can use Low Shuffle Merge to optimize the Merge process in Delta lake</title>
      <link>https://community.databricks.com/t5/community-articles/you-can-use-low-shuffle-merge-to-optimize-the-merge-process-in/m-p/95065#M307</link>
      <description>&lt;P&gt;Low Shuffle Merge in Databricks is a feature that optimizes the way data is merged when using Delta Lake, reducing the amount of data shuffled between nodes.&lt;/P&gt;&lt;P&gt;- Traditional merges can involve heavy data shuffling, as data is redistributed across the cluster to ensure correct merging.&lt;/P&gt;&lt;P&gt;- With Low Shuffle Merge, only a subset of data is shuffled, improving performance and reducing the cost of the merge operations.&lt;/P&gt;&lt;P&gt;Below are the benefits of Low Shuffle Merge:&lt;/P&gt;&lt;P&gt;1. Faster Execution: Reduces the amount of data shuffled, leading to faster merge operations.&lt;/P&gt;&lt;P&gt;2. Cost Efficiency: Lower shuffle operations mean less resource consumption (CPU, memory), reducing overall cloud costs.&lt;/P&gt;&lt;P&gt;3. Scalability: Improves the performance of merges on large datasets, enabling better scalability.&lt;/P&gt;&lt;P&gt;4. Better Cluster Utilization: Reduces network traffic and improves resource utilization on the cluster.&lt;/P&gt;&lt;P&gt;This feature is particularly useful in large-scale data processing scenarios where frequent merges are necessary, such as updating or deleting records in Delta tables.&lt;/P&gt;&lt;P&gt;You need to set the below for enabling this configuration&lt;BR /&gt;spark.databricks.delta.merge.enableLowShuffle = true&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/optimizations/low-shuffle-merge.html" target="_blank"&gt;https://docs.databricks.com/en/optimizations/low-shuffle-merge.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 20 Oct 2024 08:23:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/you-can-use-low-shuffle-merge-to-optimize-the-merge-process-in/m-p/95065#M307</guid>
      <dc:creator>Sourav-Kundu</dc:creator>
      <dc:date>2024-10-20T08:23:55Z</dc:date>
    </item>
    <item>
      <title>Re: You can use Low Shuffle Merge to optimize the Merge process in Delta lake</title>
      <link>https://community.databricks.com/t5/community-articles/you-can-use-low-shuffle-merge-to-optimize-the-merge-process-in/m-p/96513#M308</link>
      <description>&lt;P&gt;Great post,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/128004"&gt;@Sourav-Kundu&lt;/a&gt;.&amp;nbsp;The benefits you've outlined, especially regarding faster execution and cost efficiency, are valuable for anyone working with large-scale data processing.&amp;nbsp;Thanks for sharing!&lt;/P&gt;</description>
      <pubDate>Mon, 28 Oct 2024 14:01:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/you-can-use-low-shuffle-merge-to-optimize-the-merge-process-in/m-p/96513#M308</guid>
      <dc:creator>Advika_</dc:creator>
      <dc:date>2024-10-28T14:01:48Z</dc:date>
    </item>
  </channel>
</rss>

