<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic cluster nodes unavailable scenarios in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/cluster-nodes-unavailable-scenarios/m-p/44353#M27641</link>
    <description>&lt;DIV&gt;Concerning job cluster configuration, I'm trying to figure out what happens if AWS node type availability is smaller than the minimum number of workers specified in the configuration json (either &lt;SPAN&gt;availabilty&amp;lt;num_workers&lt;/SPAN&gt; or, for autoscaling, &lt;SPAN&gt;availabilty&amp;lt;min_workers&lt;/SPAN&gt;).&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Seeking insights into both scenarios:&lt;/DIV&gt;&lt;OL&gt;&lt;LI&gt;Low availability at cluster start&lt;/LI&gt;&lt;LI&gt;Availability drop while computation is already in progress&lt;/LI&gt;&lt;/OL&gt;&lt;DIV&gt;&lt;BR /&gt;Will the cluster start/continue computation? Wait? Fail?&lt;BR /&gt;Are there configurations to tweak related cluster behavior?&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;thanks!&lt;/DIV&gt;</description>
    <pubDate>Mon, 11 Sep 2023 13:35:42 GMT</pubDate>
    <dc:creator>Nino</dc:creator>
    <dc:date>2023-09-11T13:35:42Z</dc:date>
    <item>
      <title>cluster nodes unavailable scenarios</title>
      <link>https://community.databricks.com/t5/data-engineering/cluster-nodes-unavailable-scenarios/m-p/44353#M27641</link>
      <description>&lt;DIV&gt;Concerning job cluster configuration, I'm trying to figure out what happens if AWS node type availability is smaller than the minimum number of workers specified in the configuration json (either &lt;SPAN&gt;availabilty&amp;lt;num_workers&lt;/SPAN&gt; or, for autoscaling, &lt;SPAN&gt;availabilty&amp;lt;min_workers&lt;/SPAN&gt;).&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Seeking insights into both scenarios:&lt;/DIV&gt;&lt;OL&gt;&lt;LI&gt;Low availability at cluster start&lt;/LI&gt;&lt;LI&gt;Availability drop while computation is already in progress&lt;/LI&gt;&lt;/OL&gt;&lt;DIV&gt;&lt;BR /&gt;Will the cluster start/continue computation? Wait? Fail?&lt;BR /&gt;Are there configurations to tweak related cluster behavior?&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;thanks!&lt;/DIV&gt;</description>
      <pubDate>Mon, 11 Sep 2023 13:35:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cluster-nodes-unavailable-scenarios/m-p/44353#M27641</guid>
      <dc:creator>Nino</dc:creator>
      <dc:date>2023-09-11T13:35:42Z</dc:date>
    </item>
    <item>
      <title>Re: cluster nodes unavailable scenarios</title>
      <link>https://community.databricks.com/t5/data-engineering/cluster-nodes-unavailable-scenarios/m-p/44451#M27653</link>
      <description>&lt;P&gt;thanks,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;, useful info!&lt;/P&gt;&lt;P&gt;My specific scenario is running a notebook task with Job Clusters, and I've noticed that I get the best overall notebook run time by going without Autoscaling, setting the cluster configuration with a fixed `num_workers` (specifically, a single notebook where heavy ETL operation is followed by lightweight cmd cell, then something heavy again - cluster autoscales up &amp;amp; down a lot).&lt;/P&gt;&lt;P&gt;So, by your explanation, the num_workers approach puts me at risk in the case of low instance availability. This can be mitigated by Autoscaling, which in turn leads to increased run time.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a way to configure the Job Cluster so that it "aspires" for an ideal size, but doesn't fail if this ideal isn't reached?&lt;/P&gt;&lt;P&gt;This will be similar to Autoscaling, only that the cluster will not downsize voluntarily (will downsize only if lowered availability forces it to - and even then won't immediately fail). So if configured to "aspire" for 100 nodes, it'll wait x minutes and then start if anything higher than 50 nodes are available. Say 30 minutes later availability grows - it'll upscale, "aspiring" for those 100...&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can something like this be achived?&lt;/P&gt;&lt;P&gt;Thanks!&amp;nbsp; &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Sep 2023 07:11:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/cluster-nodes-unavailable-scenarios/m-p/44451#M27653</guid>
      <dc:creator>Nino</dc:creator>
      <dc:date>2023-09-12T07:11:07Z</dc:date>
    </item>
  </channel>
</rss>

