<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Error: Executor Memory Issue with Broadcast Joins in Structured Streaming – Unable to Store 69–80 MB in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-executor-memory-issue-with-broadcast-joins-in-structured/m-p/140498#M51444</link>
    <description>&lt;DIV&gt;Hi Community,&lt;/DIV&gt;&lt;DIV&gt;I encountered the following error:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Failed to store executor broadcast spark_join_relation_1622863 (size = Some(67141632)) in BlockManager&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; with storageLevel=StorageLevel(memory, deserialized, 1 replicas)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;in a Structured Streaming job in Databricks with foreachBatch writing to a Delta table.&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I’ve observed that most of the failures occurred when table sizes were in the range of 69–75 MB, and the error suggests that Spark is unable to store the broadcasted table in memory.&lt;/DIV&gt;&lt;DIV&gt;When reviewing executors memory usage,&lt;/DIV&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="pooja_bhumandla_0-1764236942720.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21974iB37E21193788C050/image-size/medium?v=v2&amp;amp;px=400" role="button" title="pooja_bhumandla_0-1764236942720.png" alt="pooja_bhumandla_0-1764236942720.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I noticed there was a few GBs of free memory available, but there was also high swap usage.&lt;BR /&gt;Given the free memory available, I would expect the executor to be able to hold the 69–80 MB table for broadcasting.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Why couldn’t it hold this data around 80MB despite having free memory in GBs?&lt;/LI&gt;&lt;LI&gt;Even if I disable the broadcast setting, I believe MERGE operations still enforce broadcasting internally.&lt;/LI&gt;&lt;LI&gt;Is this error primarily due to the broadcast threshold, or is it related to insufficient memory in the executor?&lt;/LI&gt;&lt;LI&gt;Since the error occurs when the executor cannot hold around 69–80 MB in memory, to handle this - should I increase the broadcast threshold to 100MB or decrease it?&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Looking forward to hearing your thoughts and suggestions to solve this error!&lt;/P&gt;</description>
    <pubDate>Thu, 27 Nov 2025 10:02:06 GMT</pubDate>
    <dc:creator>pooja_bhumandla</dc:creator>
    <dc:date>2025-11-27T10:02:06Z</dc:date>
    <item>
      <title>Error: Executor Memory Issue with Broadcast Joins in Structured Streaming – Unable to Store 69–80 MB</title>
      <link>https://community.databricks.com/t5/data-engineering/error-executor-memory-issue-with-broadcast-joins-in-structured/m-p/140498#M51444</link>
      <description>&lt;DIV&gt;Hi Community,&lt;/DIV&gt;&lt;DIV&gt;I encountered the following error:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Failed to store executor broadcast spark_join_relation_1622863 (size = Some(67141632)) in BlockManager&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; with storageLevel=StorageLevel(memory, deserialized, 1 replicas)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;in a Structured Streaming job in Databricks with foreachBatch writing to a Delta table.&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I’ve observed that most of the failures occurred when table sizes were in the range of 69–75 MB, and the error suggests that Spark is unable to store the broadcasted table in memory.&lt;/DIV&gt;&lt;DIV&gt;When reviewing executors memory usage,&lt;/DIV&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="pooja_bhumandla_0-1764236942720.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/21974iB37E21193788C050/image-size/medium?v=v2&amp;amp;px=400" role="button" title="pooja_bhumandla_0-1764236942720.png" alt="pooja_bhumandla_0-1764236942720.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I noticed there was a few GBs of free memory available, but there was also high swap usage.&lt;BR /&gt;Given the free memory available, I would expect the executor to be able to hold the 69–80 MB table for broadcasting.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Why couldn’t it hold this data around 80MB despite having free memory in GBs?&lt;/LI&gt;&lt;LI&gt;Even if I disable the broadcast setting, I believe MERGE operations still enforce broadcasting internally.&lt;/LI&gt;&lt;LI&gt;Is this error primarily due to the broadcast threshold, or is it related to insufficient memory in the executor?&lt;/LI&gt;&lt;LI&gt;Since the error occurs when the executor cannot hold around 69–80 MB in memory, to handle this - should I increase the broadcast threshold to 100MB or decrease it?&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Looking forward to hearing your thoughts and suggestions to solve this error!&lt;/P&gt;</description>
      <pubDate>Thu, 27 Nov 2025 10:02:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-executor-memory-issue-with-broadcast-joins-in-structured/m-p/140498#M51444</guid>
      <dc:creator>pooja_bhumandla</dc:creator>
      <dc:date>2025-11-27T10:02:06Z</dc:date>
    </item>
    <item>
      <title>Re: Error: Executor Memory Issue with Broadcast Joins in Structured Streaming – Unable to Store 69–8</title>
      <link>https://community.databricks.com/t5/data-engineering/error-executor-memory-issue-with-broadcast-joins-in-structured/m-p/140557#M51463</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/170125"&gt;@pooja_bhumandla&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Join strategy&lt;/LI&gt;&lt;/OL&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;P class=""&gt;For the Delta MERGE, try to ensure the large side is not broadcast by setting autoBroadcastJoinThreshold low or disabling it&amp;nbsp;&lt;A href="https://kb.databricks.com/sql/bchashjoin-exceeds-bcjointhreshold-oom" target="_blank" rel="noopener"&gt;https://kb.databricks.com/sql/bchashjoin-exceeds-bcjointhreshold-oom&lt;/A&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;If you are explicitly broadcasting a reference DataFrame, remove the hint or replace with a shuffle join hint&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;2&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;Memory and cluster configuration&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Increase executor memory and memory overhead if the job is genuinely heavy&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Reduce the number of cores per executor to give each task more memory headroom&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;3 Micro‑batch load&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;If the size of the source side of the MERGE grows over time, consider limiting micro‑batch size&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;A href="https://docs.databricks.com/aws/en/structured-streaming/foreach" target="_blank" rel="noopener"&gt;https://docs.databricks.com/aws/en/structured-streaming/foreach&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;If you share your cluster specs (executor memory/cores, threshold settings, and rough sizes of the tables on each side of the MERGE), some more solutioning brainstorming can be done&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Nov 2025 20:56:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-executor-memory-issue-with-broadcast-joins-in-structured/m-p/140557#M51463</guid>
      <dc:creator>ManojkMohan</dc:creator>
      <dc:date>2025-11-27T20:56:02Z</dc:date>
    </item>
    <item>
      <title>Re: Error: Executor Memory Issue with Broadcast Joins in Structured Streaming – Unable to Store 69–8</title>
      <link>https://community.databricks.com/t5/data-engineering/error-executor-memory-issue-with-broadcast-joins-in-structured/m-p/140567#M51467</link>
      <description>&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;What Spark Does During a Broadcast Join-&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Spark identifies the smaller table (say 80MB).&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;The driver collects this small table to a single JVM.&lt;/LI&gt;&lt;LI&gt;The driver serializes the table into a broadcast variable.&lt;/LI&gt;&lt;LI&gt;The broadcast variable is shipped to all executors.&lt;/LI&gt;&lt;LI&gt;Executors store it inside the BlockManager storage region.&lt;/LI&gt;&lt;LI&gt;Each executor loads it into memory to build a hash map for fast joining.&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Fri, 28 Nov 2025 06:10:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-executor-memory-issue-with-broadcast-joins-in-structured/m-p/140567#M51467</guid>
      <dc:creator>Yogesh_Verma_</dc:creator>
      <dc:date>2025-11-28T06:10:38Z</dc:date>
    </item>
  </channel>
</rss>

