<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Smaller dataset causing OOM on large cluster in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/117802#M9968</link>
    <description>&lt;P&gt;I have a pyspark job reading the input data volume of just ~50-55GB Parquet data from a delta table on Databricks. Job is using n2-highmem-4 GCP VM and 1-15 worker with autoscaling on databricks. Each workerVM of type n2-highmem-4 has 32GB memory and 4 cores. Each VM has one executor. 22GB is allocated per executor. ie 22*15=330GB overall executor memory, which seems to be large enough for ~55GB input data. shuffle partition is set to 200. But Im getting OOM error.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Input data volume : 55GB&lt;/LI&gt;&lt;LI&gt;Number of worker : 15 n2-highmem-4 GCP VM and 1-15 worker with autoscaling&lt;/LI&gt;&lt;LI&gt;Number of executor per worker : 1&lt;/LI&gt;&lt;LI&gt;number of core per executor (or worker) : 4 ie. only 4 tasks can run in parallel&lt;/LI&gt;&lt;LI&gt;shuffle partitions : 200&lt;/LI&gt;&lt;LI&gt;so number of partitions per worker : 200/15 = ~13 partitions&lt;/LI&gt;&lt;LI&gt;data per partition : 55GB/200 = ~275MB (this is just for calculation, there would be skew, some partitions will have much more data, is there a way to figure out from spark UI?)&lt;/LI&gt;&lt;LI&gt;Overall executor memory : 22*15=330GB&amp;nbsp;&lt;UL&gt;&lt;LI&gt;Spark memory&amp;nbsp;(storage+execution) per worker =&amp;nbsp;0.6*(22000MB-300MB) = ~13GB&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Could you please help understand why this is not sufficient leading to oom? Also is it necessary for all ~13 partitions assigned to an executor to fit in memory at once or since only 4 tasks run in parallel per executor, is it sufficient for memory to accommodate just 4 partitions at a time?&lt;/P&gt;</description>
    <pubDate>Tue, 06 May 2025 05:52:56 GMT</pubDate>
    <dc:creator>Klusener</dc:creator>
    <dc:date>2025-05-06T05:52:56Z</dc:date>
    <item>
      <title>Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/117802#M9968</link>
      <description>&lt;P&gt;I have a pyspark job reading the input data volume of just ~50-55GB Parquet data from a delta table on Databricks. Job is using n2-highmem-4 GCP VM and 1-15 worker with autoscaling on databricks. Each workerVM of type n2-highmem-4 has 32GB memory and 4 cores. Each VM has one executor. 22GB is allocated per executor. ie 22*15=330GB overall executor memory, which seems to be large enough for ~55GB input data. shuffle partition is set to 200. But Im getting OOM error.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Input data volume : 55GB&lt;/LI&gt;&lt;LI&gt;Number of worker : 15 n2-highmem-4 GCP VM and 1-15 worker with autoscaling&lt;/LI&gt;&lt;LI&gt;Number of executor per worker : 1&lt;/LI&gt;&lt;LI&gt;number of core per executor (or worker) : 4 ie. only 4 tasks can run in parallel&lt;/LI&gt;&lt;LI&gt;shuffle partitions : 200&lt;/LI&gt;&lt;LI&gt;so number of partitions per worker : 200/15 = ~13 partitions&lt;/LI&gt;&lt;LI&gt;data per partition : 55GB/200 = ~275MB (this is just for calculation, there would be skew, some partitions will have much more data, is there a way to figure out from spark UI?)&lt;/LI&gt;&lt;LI&gt;Overall executor memory : 22*15=330GB&amp;nbsp;&lt;UL&gt;&lt;LI&gt;Spark memory&amp;nbsp;(storage+execution) per worker =&amp;nbsp;0.6*(22000MB-300MB) = ~13GB&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Could you please help understand why this is not sufficient leading to oom? Also is it necessary for all ~13 partitions assigned to an executor to fit in memory at once or since only 4 tasks run in parallel per executor, is it sufficient for memory to accommodate just 4 partitions at a time?&lt;/P&gt;</description>
      <pubDate>Tue, 06 May 2025 05:52:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/117802#M9968</guid>
      <dc:creator>Klusener</dc:creator>
      <dc:date>2025-05-06T05:52:56Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/117991#M9969</link>
      <description>&lt;DIV class="paragraph"&gt;The OutOfMemory (OOM) issue you're experiencing in your PySpark job could stem from several factors. Here's a breakdown of potential causes and mitigation strategies:&lt;/DIV&gt;
&lt;OL start="1"&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Skew in Data Partitions&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Based on your calculation, the data size per partition is approximately 275 MB. However, due to possible data skew, some partitions could be significantly larger and overwhelm the executor memory. To investigate skew, you can check the Spark UI:
&lt;UL&gt;
&lt;LI&gt;Navigate to the "Stages" tab of the Spark UI.&lt;/LI&gt;
&lt;LI&gt;For failed stages, examine partition sizes in the stage detail summary.&lt;/LI&gt;
&lt;LI&gt;If some partitions are exceptionally large compared to others, this indicates skew.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;To address skew:
&lt;UL&gt;
&lt;LI&gt;Increase the number of shuffle partitions beyond 200 to distribute data more evenly.&lt;/LI&gt;
&lt;LI&gt;Use Adaptive Query Execution (AQE), which dynamically coalesces skewed partitions at runtime. Enable this with: &lt;CODE&gt;python
spark.conf.set("spark.sql.adaptive.enabled", "true")
&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;Consider using Spark’s &lt;CODE&gt;skew&lt;/CODE&gt; hints to handle skewed joins or aggregations.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Execution Memory&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Executors have a memory allocation, calculated as &lt;CODE&gt;0.6 * (availableMemory - reservedMemory)&lt;/CODE&gt;, where approximately 13 GB per executor is available for execution and storage tasks. If tasks for large partitions require more memory, spilling to disk occurs, which can lead to OOM errors.&lt;/LI&gt;
&lt;LI&gt;Because only four tasks run in parallel on each executor (four cores per executor), memory may only need to accommodate these four concurrent tasks. However, if any single task exceeds its share of memory, you'll encounter OOM. Ensure partitions are small enough for this allocation.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Storage vs. Execution Memory&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Managing memory pressure due to intermediate data (like shuffle, join, or aggregation) spilling to disk can help reduce OOM issues. You can adjust memory configurations to allocate more towards execution workload: &lt;CODE&gt;python
spark.conf.set("spark.memory.fraction", "0.8")  # Adjusts execution memory fraction
&lt;/CODE&gt;&lt;/LI&gt;
&lt;LI&gt;Alternatively, forcing intermediate spills to disk earlier (instead of keeping them in-memory) could mitigate constraints.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Cluster Configuration&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Evaluate the vertical and horizontal scaling of your cluster. If OOM persists despite partition adjustments, consider increasing the memory for each executor or the number of workers to spread the load more evenly.&lt;/LI&gt;
&lt;LI&gt;For instance:
&lt;UL&gt;
&lt;LI&gt;If upgrading workers, opt for instance types optimized for memory.&lt;/LI&gt;
&lt;LI&gt;If increasing the number of workers, repartition the data to maximize parallelism.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;&lt;STRONG&gt;Additional Debugging Tips&lt;/STRONG&gt;:
&lt;UL&gt;
&lt;LI&gt;Enable more detailed logging and diagnostic tools to pinpoint challenges in specific stages or tasks.&lt;/LI&gt;
&lt;LI&gt;Use the Spark SQL and Catalyst optimizations (&lt;CODE&gt;explain()&lt;/CODE&gt; function) to understand how transformations and actions are executed. An optimal logical and physical plan helps avoid performance bottlenecks.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;DIV class="paragraph"&gt;These steps should help you identify and mitigate the OOM issue affecting your job. As always, iterative tuning and profiling based on specific details of your workload is key to achieving optimal performance.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Hope this helps, Big Roux.&lt;/DIV&gt;</description>
      <pubDate>Tue, 06 May 2025 20:04:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/117991#M9969</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-05-06T20:04:50Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118045#M9970</link>
      <description>&lt;P&gt;Thank you so much for the detailed response, much appreciate. Two followup question&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;how do we check the partition size for failed tasks from UI (or skew)? for ex, if I goto Spark UI for the failed stage, it gives summary as below. It shows 4 tasks as failed, but does not indicate partition size that caused oom.&lt;/LI&gt;&lt;LI&gt;'Summary Metrics' indicates Max shuffle Read Size as 838.1MB. Just curious is not it smaller size to cause OOM?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Klusener_0-1746599429539.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/16557iFE721C63F6827000/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Klusener_0-1746599429539.png" alt="Klusener_0-1746599429539.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 May 2025 06:33:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118045#M9970</guid>
      <dc:creator>Klusener</dc:creator>
      <dc:date>2025-05-07T06:33:18Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118130#M9971</link>
      <description>&lt;P&gt;We will get back to you shortly.&lt;/P&gt;</description>
      <pubDate>Wed, 07 May 2025 11:18:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118130#M9971</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-05-07T11:18:26Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118177#M9972</link>
      <description>&lt;P&gt;You need to enable more Metrics.&amp;nbsp; Click on the below hotlink and turn on all Metrics.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="mark_ott_0-1746623229451.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/16595i342B6DB81B0D56BF/image-size/medium?v=v2&amp;amp;px=400" role="button" title="mark_ott_0-1746623229451.png" alt="mark_ott_0-1746623229451.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 May 2025 13:07:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118177#M9972</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-05-07T13:07:43Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118183#M9973</link>
      <description>&lt;P&gt;Opps. That's the wrong pic.&amp;nbsp; Here's the correct one.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="metrics.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/16599i0A07DF6DFABE0BA9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="metrics.png" alt="metrics.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;</description>
      <pubDate>Wed, 07 May 2025 13:09:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118183#M9973</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-05-07T13:09:46Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118198#M9974</link>
      <description>&lt;P&gt;I'm guessing you are running one of more Wide transformations in your query and that is causing Skewed Shuffle Partitions.&amp;nbsp; Go back to Stages tab and check out 'Shuffle Write Size/Records' row.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="m2.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/16607iB769923247FBF75D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="m2.png" alt="m2.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt; &lt;/P&gt;</description>
      <pubDate>Wed, 07 May 2025 13:35:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118198#M9974</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-05-07T13:35:31Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118203#M9975</link>
      <description>&lt;P&gt;I'm guessing you have Shuffle Write Sizes that are &amp;gt; 1GB.&amp;nbsp; That's when things start going down the rathole with things like Spill and OOM.&amp;nbsp; Here's a few questions I have for your.&amp;nbsp; Is Adaptive Query Execution enabled? Also I say in your earlier screen shot you had some nasty Java Garbage Collection.&amp;nbsp; Is your Cluster Photon-enabled?&amp;nbsp; This can reduce the JGC.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 May 2025 13:40:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118203#M9975</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-05-07T13:40:06Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118205#M9976</link>
      <description>&lt;P&gt;Other things to consider. By any chance do you have Spot instances of Workers turned on (edge case)? I've seen where this hand-cuffs AQE. If have join, do you have the smaller table as the first table in the JOIN?&amp;nbsp; Are you ANALYZE TABLE which can change the Join strategy to one that won't go OOM? These are some things to consider.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 May 2025 13:43:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118205#M9976</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-05-07T13:43:48Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118236#M9977</link>
      <description>&lt;P&gt;Much appreciate &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/82205"&gt;@mark_ott&lt;/a&gt;&amp;nbsp; and&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34815"&gt;@Louis_Frolio&lt;/a&gt;&amp;nbsp;for the prompt response.&lt;/P&gt;&lt;P&gt;The job uses below cluster/settings.&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Cluster/spark version -&amp;nbsp;&lt;SPAN&gt;Driver: n2-highmem-4 · Workers: n2-highmem-4 · 5-15 workers · DBR: 15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12) on GCP&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;Photon is not enabled&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;SPAN&gt;Spot/Preemptible instance is enabled&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;SPAN&gt;rest default databricks settings, not set any configs explicitly&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;&lt;SPAN&gt;I just enabled 'Show Additional Metrics'&amp;nbsp; on stage and&amp;nbsp;attaching both job/stage/task details from Spark UI. Only single job and stage has failed. There is no shuffle write. Is not AQE enabled be default on Spark 3 onwards?&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Klusener_0-1746629479189.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/16623iCEE4A0A636ED8C5C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Klusener_0-1746629479189.png" alt="Klusener_0-1746629479189.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Klusener_1-1746629503205.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/16624i3698A7BD80CBBF53/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Klusener_1-1746629503205.png" alt="Klusener_1-1746629503205.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Klusener_2-1746629611871.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/16625iB67F9C5763461D9D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Klusener_2-1746629611871.png" alt="Klusener_2-1746629611871.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 May 2025 15:06:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118236#M9977</guid>
      <dc:creator>Klusener</dc:creator>
      <dc:date>2025-05-07T15:06:57Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118473#M9978</link>
      <description>&lt;P&gt;OK, without having your code or DAG, it's a little difficult to figure this out.&amp;nbsp; But here's something that should work.&amp;nbsp; First, figure out who many Memory Partitions you have.&amp;nbsp; Apparently, your Memory Partitions are too big for the cluster, hence the OOM. Use this generic code as a template.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;num_partitions&amp;nbsp;&lt;SPAN class="token token operator"&gt;=&lt;/SPAN&gt;&amp;nbsp;df&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;rdd&lt;SPAN class="token token punctuation"&gt;.&lt;/SPAN&gt;getNumPartitions&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;&lt;SPAN class="token token"&gt;print&lt;/SPAN&gt;&lt;SPAN class="token token punctuation"&gt;(&lt;/SPAN&gt;num_partitions&lt;SPAN class="token token punctuation"&gt;)&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 08 May 2025 13:04:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118473#M9978</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-05-08T13:04:41Z</dc:date>
    </item>
    <item>
      <title>Re: Smaller dataset causing OOM on large cluster</title>
      <link>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118477#M9979</link>
      <description>&lt;P&gt;Next, use the &lt;STRONG&gt;repartition(n)&lt;/STRONG&gt; to increase your dataframe to twice the number you got earlier. For example, if num_partitions was 30, then &lt;STRONG&gt;repartition(60)&lt;/STRONG&gt; prior to running your query.&amp;nbsp; With half the data in each Memory Partition, I'm guessing you won't OOM.&amp;nbsp; If you still do, increase the number by x2 again until the OOM disappears..&lt;/P&gt;</description>
      <pubDate>Thu, 08 May 2025 13:08:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/smaller-dataset-causing-oom-on-large-cluster/m-p/118477#M9979</guid>
      <dc:creator>mark_ott</dc:creator>
      <dc:date>2025-05-08T13:08:07Z</dc:date>
    </item>
  </channel>
</rss>

