<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to setup an all-purpose cluster pool for all my jobs? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27838#M19686</link>
    <description>&lt;UL&gt;&lt;LI&gt;Autoscaling goes up only when required by the size of the dataset etc. Another job will create a new cluster using idle machines from the pool and, if not idle, deploying new ones.&lt;/LI&gt;&lt;LI&gt;So the pool is designed so that another job can reuse VMs. I see two strategies:&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;1) have min idle to set for some numbers, so machines are waiting to handle your job, and you reserve them to get a discount,&lt;/P&gt;&lt;P&gt;2) or just the opposite, have 0 idle and use spot instances,&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;regarding errors, please check that you don't hit quotas in your service provider (for example, in portal azure, type quotas in the search box)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 12 Oct 2022 17:12:20 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2022-10-12T17:12:20Z</dc:date>
    <item>
      <title>How to setup an all-purpose cluster pool for all my jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27837#M19685</link>
      <description>&lt;P&gt;Today, we start working on setting up an all-purpose cluster pool for all the jobs that we are running on databricks. We used the documentation for this but we got some issues when running our jobs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The errors in the jobs are the following:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Error message"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1369i1299D45EB48789E8/image-size/large?v=v2&amp;amp;px=999" role="button" title="Error message" alt="Error message" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The jobs are running in parallel. To give an explanation:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Jobs"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1365iD7BE3510386896DA/image-size/large?v=v2&amp;amp;px=999" role="button" title="Jobs" alt="Jobs" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The pool has the following configuration:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="pool configuration"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1364i77131F5D7573AC8E/image-size/large?v=v2&amp;amp;px=999" role="button" title="pool configuration" alt="pool configuration" /&gt;&lt;/span&gt;Cluster has the following configuration:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="cluster configuration"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1371i3C4085F13ECD7117/image-size/large?v=v2&amp;amp;px=999" role="button" title="cluster configuration" alt="cluster configuration" /&gt;&lt;/span&gt;The log4j-active is in the attachment. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Furthermore, I saw that the autoscaling didn't scale up, when doing multiple jobs at the same time. If running multiple jobs on the same pool, it should autoscale right?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your time!&lt;/P&gt;</description>
      <pubDate>Wed, 12 Oct 2022 12:52:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27837#M19685</guid>
      <dc:creator>Siebert_Looije</dc:creator>
      <dc:date>2022-10-12T12:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to setup an all-purpose cluster pool for all my jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27838#M19686</link>
      <description>&lt;UL&gt;&lt;LI&gt;Autoscaling goes up only when required by the size of the dataset etc. Another job will create a new cluster using idle machines from the pool and, if not idle, deploying new ones.&lt;/LI&gt;&lt;LI&gt;So the pool is designed so that another job can reuse VMs. I see two strategies:&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;1) have min idle to set for some numbers, so machines are waiting to handle your job, and you reserve them to get a discount,&lt;/P&gt;&lt;P&gt;2) or just the opposite, have 0 idle and use spot instances,&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;regarding errors, please check that you don't hit quotas in your service provider (for example, in portal azure, type quotas in the search box)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Oct 2022 17:12:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27838#M19686</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-10-12T17:12:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to setup an all-purpose cluster pool for all my jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27839#M19687</link>
      <description>&lt;P&gt;Thanks for the explanation!&lt;/P&gt;&lt;P&gt;What do I define for the cluster then? Because we have quite some jobs which are in parallel, should I define multiple clusters and set them in the pool or is there a better way to add multiple clusters to the pool? As per job there is a new cluster used.&lt;/P&gt;&lt;P&gt;Because the current situation is that we start a job cluster per job now and we are not reusing the job cluster and we would like to find a way to reuse the job cluster (I was thinking this was with the pool feature)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What does the 'Failure starting repl. ' error mean? So I can look a bit more in the direction on which quotas could be hit?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for taking the time to answer the question!&lt;/P&gt;</description>
      <pubDate>Wed, 12 Oct 2022 17:49:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27839#M19687</guid>
      <dc:creator>Siebert_Looije</dc:creator>
      <dc:date>2022-10-12T17:49:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to setup an all-purpose cluster pool for all my jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27840#M19688</link>
      <description>&lt;P&gt;Hi @Siebert Looije​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or &lt;B&gt;mark an answer as best&lt;/B&gt;? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Nov 2022 12:49:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27840#M19688</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-27T12:49:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to setup an all-purpose cluster pool for all my jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27841#M19689</link>
      <description>&lt;P&gt;Hi @Vidula Khanna​&amp;nbsp;, thanks for reaching out. No I didn't really get a solution on this yet. I got some follow up questions, which were not really answered until now.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Nov 2022 06:54:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-setup-an-all-purpose-cluster-pool-for-all-my-jobs/m-p/27841#M19689</guid>
      <dc:creator>Siebert_Looije</dc:creator>
      <dc:date>2022-11-28T06:54:38Z</dc:date>
    </item>
  </channel>
</rss>

