<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/calculate-the-number-of-parallel-tasks-that-can-be-executed-in-a/m-p/92682#M38505</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, this is really fantastic guidance, will something similar be added to the Databricks docs?&lt;/P&gt;</description>
    <pubDate>Thu, 03 Oct 2024 17:52:10 GMT</pubDate>
    <dc:creator>dylanberry</dc:creator>
    <dc:date>2024-10-03T17:52:10Z</dc:date>
    <item>
      <title>calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/calculate-the-number-of-parallel-tasks-that-can-be-executed-in-a/m-p/66254#M33065</link>
      <description>&lt;P&gt;I want to confirm if this understanding is correct ???&lt;/P&gt;&lt;P&gt;To calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster with the given configuration, we need to consider the number of executors that can run on each node and the number of tasks each executor can handle.&lt;/P&gt;&lt;P&gt;Here’s the breakdown:&lt;/P&gt;&lt;P&gt;Number of Nodes: 10&lt;BR /&gt;CPU Cores per Node: 16&lt;BR /&gt;RAM per Node: 64 GB&lt;BR /&gt;Executor Size: 5 CPU cores and 20 GB RAM per executor&lt;BR /&gt;Background Process: 1 CPU core and 4 GB RAM per node&lt;BR /&gt;Each node has 1 CPU core and 4 GB RAM reserved for background processes, leaving us with 15 CPU cores and 60 GB RAM available for executors per node.&lt;/P&gt;&lt;P&gt;Given that each executor requires 5 CPU cores and 20 GB RAM, you can run 3 executors per node (since 15 cores/5 cores per executor = 3 executors and 60 GB RAM/20 GB RAM per executor = 3 executors).&lt;/P&gt;&lt;P&gt;Since you have 10 nodes, you can run a total of 30 executors across the cluster (10 nodes * 3 executors per node).&lt;/P&gt;&lt;P&gt;Now, by default, each executor runs one task per core. Since each executor has 5 CPU cores, each executor can run 5 parallel tasks.&lt;/P&gt;&lt;P&gt;Therefore, the total number of parallel tasks that can be executed across the cluster is 150 (30 executors * 5 tasks per executor).&lt;/P&gt;&lt;P&gt;So, with the provided cluster configuration, your Databricks PySpark cluster can execute 150 parallel tasks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Apr 2024 11:00:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calculate-the-number-of-parallel-tasks-that-can-be-executed-in-a/m-p/66254#M33065</guid>
      <dc:creator>manish1987c</dc:creator>
      <dc:date>2024-04-15T11:00:51Z</dc:date>
    </item>
    <item>
      <title>Re: calculate the number of parallel tasks that can be executed in a Databricks PySpark cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/calculate-the-number-of-parallel-tasks-that-can-be-executed-in-a/m-p/92682#M38505</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, this is really fantastic guidance, will something similar be added to the Databricks docs?&lt;/P&gt;</description>
      <pubDate>Thu, 03 Oct 2024 17:52:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/calculate-the-number-of-parallel-tasks-that-can-be-executed-in-a/m-p/92682#M38505</guid>
      <dc:creator>dylanberry</dc:creator>
      <dc:date>2024-10-03T17:52:10Z</dc:date>
    </item>
  </channel>
</rss>

