<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Run more than nr-of-cores concurrent tasks. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15188#M9545</link>
    <description>&lt;P&gt;We are using the terraform databricks provier, which is starting a cluster and checking every mount (since there is no mount rest API!). Each mount takes 20 seconds to check, and 99.9% of that time is idle waiting, and it starts a job per mount. If we could run many (more than nr of cores) jobs concurrently we should be able to make it faster, but I cant find how to do this. I have tried setting &lt;/P&gt;&lt;P&gt;`spark.executor.instances` to 2*cores, but it seems to be ignored.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, is it possible to set databricks to use more spark executors than nr of cores? &lt;/P&gt;</description>
    <pubDate>Mon, 20 Sep 2021 12:46:56 GMT</pubDate>
    <dc:creator>Erik</dc:creator>
    <dc:date>2021-09-20T12:46:56Z</dc:date>
    <item>
      <title>Run more than nr-of-cores concurrent tasks.</title>
      <link>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15188#M9545</link>
      <description>&lt;P&gt;We are using the terraform databricks provier, which is starting a cluster and checking every mount (since there is no mount rest API!). Each mount takes 20 seconds to check, and 99.9% of that time is idle waiting, and it starts a job per mount. If we could run many (more than nr of cores) jobs concurrently we should be able to make it faster, but I cant find how to do this. I have tried setting &lt;/P&gt;&lt;P&gt;`spark.executor.instances` to 2*cores, but it seems to be ignored.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, is it possible to set databricks to use more spark executors than nr of cores? &lt;/P&gt;</description>
      <pubDate>Mon, 20 Sep 2021 12:46:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15188#M9545</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2021-09-20T12:46:56Z</dc:date>
    </item>
    <item>
      <title>Re: Run more than nr-of-cores concurrent tasks.</title>
      <link>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15190#M9547</link>
      <description>&lt;P&gt;Hey @Kaniz Fatma​&amp;nbsp;, it does not seem like the community have an answer to this. Maybe you have access to some Databricks engineers who know the answer? &lt;/P&gt;</description>
      <pubDate>Tue, 05 Oct 2021 11:16:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15190#M9547</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2021-10-05T11:16:59Z</dc:date>
    </item>
    <item>
      <title>Re: Run more than nr-of-cores concurrent tasks.</title>
      <link>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15192#M9549</link>
      <description>&lt;P&gt;Hi @Erik Parmann​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Does this old post helps &lt;A href="https://community.databricks.com/s/question/0D53f00001LKDICCA5/how-to-restrict-the-number-of-tasks-per-executor" alt="https://community.databricks.com/s/question/0D53f00001LKDICCA5/how-to-restrict-the-number-of-tasks-per-executor" target="_blank"&gt;link&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Also, where did you added the Spark configuration for "spark.executor.instances"? this should be set at the cluster level setting.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Oct 2021 17:52:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15192#M9549</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-10-07T17:52:44Z</dc:date>
    </item>
    <item>
      <title>Re: Run more than nr-of-cores concurrent tasks.</title>
      <link>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15193#M9550</link>
      <description>&lt;P&gt;Hi @Jose Gonzalez​, thanks for the suggestion. But that link asks how to *limit* the nr of executors, so each gets more memory. I want to do the opposite, I want *more* executors per core (or make each executor execute many parallell tasks). The default value for `spark.task.cpus` is `1`, and it does not seem to accept a value like 0.1, then it refuses to start up.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I set the cluster level settings under "Advanced options", below I attached a screenshot of how I tried editing the spark.task.cpus setting:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="cluster-config"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2423iC534DDA3C27A9BFD/image-size/large?v=v2&amp;amp;px=999" role="button" title="cluster-config" alt="cluster-config" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 09 Oct 2021 11:54:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15193#M9550</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2021-10-09T11:54:10Z</dc:date>
    </item>
    <item>
      <title>Re: Run more than nr-of-cores concurrent tasks.</title>
      <link>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15194#M9551</link>
      <description>&lt;P&gt;@Jose Gonzalez​&amp;nbsp;@Kaniz Fatma​&amp;nbsp; : Since there is no more answers I am starting to belive that maybe it is not possible to get databricks to use more spark executors than nr of cores. Can you verify this for me?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Oct 2021 10:23:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15194#M9551</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2021-10-15T10:23:35Z</dc:date>
    </item>
    <item>
      <title>Re: Run more than nr-of-cores concurrent tasks.</title>
      <link>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15195#M9552</link>
      <description>&lt;P&gt;hi @Erik Parmann​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It is possible to do, but you might need to also enable dynamic allocation at the cluster level to be able to make sure your settings are apply at cluster creation . You can find more details &lt;A href="https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation" alt="https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation" target="_blank"&gt;here&lt;/A&gt;. As best practice, we do not recommend  to  change this configurations because it might create other issues. We recommend to use the default options we provided. &lt;/P&gt;</description>
      <pubDate>Mon, 18 Oct 2021 22:00:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15195#M9552</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-10-18T22:00:33Z</dc:date>
    </item>
    <item>
      <title>Re: Run more than nr-of-cores concurrent tasks.</title>
      <link>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15196#M9553</link>
      <description>&lt;P&gt;Thanks for the reply! I understand that in generall the default options are good, but in this exact usecase (many tiny operations which are each 99.99999% IO bound) it is really suboptimal, and it really make the databricks-with-IAC experience a bit cumbersome.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried with the following settings in the "Spark Config" section:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.dynamicAllocation.enabled true
spark.dynamicAllocation.shuffleTracking.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.initialExecutors 8
spark.dynamicAllocation.minExecutors 8
spark.scheduler.mode FIFO&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="bilde"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2425i4D656EE9CD57AB88/image-size/large?v=v2&amp;amp;px=999" role="button" title="bilde" alt="bilde" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But on a 4-core machine I am still only able to get 1 executor (as seen in the "Spark Cluster UI"-tab) executing up to 4 tasks in parallell. I tried with "High concurrency" cluster and "Standard". Are you actually able to get many executors running by changing "spark.dynamicAllocation.enabled" and "spark.dynamicAllocation.minExecutors" ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Oct 2021 11:32:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-more-than-nr-of-cores-concurrent-tasks/m-p/15196#M9553</guid>
      <dc:creator>Erik</dc:creator>
      <dc:date>2021-10-20T11:32:51Z</dc:date>
    </item>
  </channel>
</rss>

