<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SingleNode all-purpose cluster for small ETLs in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32458#M23653</link>
    <description>&lt;P&gt;@E H​&amp;nbsp;we're definitely thinking about budgets and quotas for jobs. There are several things we can do, ranked in order of rough complexity-to-implement:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Display the DBU cost of each job in the Jobs UI.&lt;/LI&gt;&lt;LI&gt;Alert on the DBU cost of a job (e.g. "Alert me if this job costs &amp;gt;20 DBUs")&lt;/LI&gt;&lt;LI&gt;Alert on the $$ cost of a job (e.g. "Alert me if this job costs &amp;gt;$5")&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thoughts on what you'd prefer?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 04 Jan 2022 10:36:44 GMT</pubDate>
    <dc:creator>BilalAslamDbrx</dc:creator>
    <dc:date>2022-01-04T10:36:44Z</dc:date>
    <item>
      <title>SingleNode all-purpose cluster for small ETLs</title>
      <link>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32453#M23648</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have many "small" jobs than needs to be executed quickly and at a predictable low cost from several Azure Data Factory pipelines. For this reason, I configured a small single node cluster to execute those processes. For the moment, everything seems to run as expected and I get approximatively 30s execution for each job after the first execution.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, based on the documentation, it seems as if my use case is not officially supported. Am I understanding this correctly? It this simply a warning or will I have potential issues with this solution?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2211i212C18ACD4F9651A/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Dec 2021 01:43:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32453#M23648</guid>
      <dc:creator>RicksDB</dc:creator>
      <dc:date>2021-12-30T01:43:46Z</dc:date>
    </item>
    <item>
      <title>Re: SingleNode all-purpose cluster for small ETLs</title>
      <link>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32454#M23649</link>
      <description>&lt;P&gt;Hello again! As before, if, after a while, if the community does not respond, we'll get back to this.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Dec 2021 17:29:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32454#M23649</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-30T17:29:47Z</dc:date>
    </item>
    <item>
      <title>Re: SingleNode all-purpose cluster for small ETLs</title>
      <link>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32455#M23650</link>
      <description>&lt;P&gt;In this sense they mean shared among many users.  If you had 4 different users submitting jobs to a single node cluster you'd have some trouble with the resource balancing.  &lt;/P&gt;&lt;P&gt;If what you're doing is currently working, keep doing it!  &lt;/P&gt;</description>
      <pubDate>Fri, 31 Dec 2021 00:15:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32455#M23650</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-12-31T00:15:16Z</dc:date>
    </item>
    <item>
      <title>Re: SingleNode all-purpose cluster for small ETLs</title>
      <link>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32456#M23651</link>
      <description>&lt;P&gt;Exactly what @Joseph Kambourakis​&amp;nbsp;said. Single node clusters are designed to be used for single-user machine learning use cases. Think of them as a laptop in the sky.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@E H​&amp;nbsp;​&amp;nbsp;your use case is really good, we get this all the time. We are working hard to bring serverless clusters to the Data Science &amp;amp; Engineer Workspace. Once we have those, you will get super fast startup time. Is that the ideal solution in your mind for your use case?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jan 2022 14:55:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32456#M23651</guid>
      <dc:creator>BilalAslamDbrx</dc:creator>
      <dc:date>2022-01-03T14:55:03Z</dc:date>
    </item>
    <item>
      <title>Re: SingleNode all-purpose cluster for small ETLs</title>
      <link>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32457#M23652</link>
      <description>&lt;P&gt;Hi @Bilal Aslam​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Serverless clusters would definitely help regarding the speed required for the small jobs of most of my clients.&lt;/P&gt;&lt;P&gt;That being said, most of these clients requires calculating the "worst case" for most technologies when presenting a business case.&lt;/P&gt;&lt;P&gt;Right now, I am able to do so using interactive clusters since I can assume the worst (744 hours) knowing that the jobs will be queued thus respecting budget if it happens.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Will it be possible to put quotas to achieve the same thing? (I.e ensure there no unexpected high charge such as an infinite loops caused by a user causing high cost instead of email alerts and custom scripts detecting such errors)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If this is achievable, this is exactly what we need.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jan 2022 17:39:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32457#M23652</guid>
      <dc:creator>RicksDB</dc:creator>
      <dc:date>2022-01-03T17:39:16Z</dc:date>
    </item>
    <item>
      <title>Re: SingleNode all-purpose cluster for small ETLs</title>
      <link>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32458#M23653</link>
      <description>&lt;P&gt;@E H​&amp;nbsp;we're definitely thinking about budgets and quotas for jobs. There are several things we can do, ranked in order of rough complexity-to-implement:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Display the DBU cost of each job in the Jobs UI.&lt;/LI&gt;&lt;LI&gt;Alert on the DBU cost of a job (e.g. "Alert me if this job costs &amp;gt;20 DBUs")&lt;/LI&gt;&lt;LI&gt;Alert on the $$ cost of a job (e.g. "Alert me if this job costs &amp;gt;$5")&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thoughts on what you'd prefer?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jan 2022 10:36:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32458#M23653</guid>
      <dc:creator>BilalAslamDbrx</dc:creator>
      <dc:date>2022-01-04T10:36:44Z</dc:date>
    </item>
    <item>
      <title>Re: SingleNode all-purpose cluster for small ETLs</title>
      <link>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32459#M23654</link>
      <description>&lt;P&gt;@Bilal Aslam​&amp;nbsp; In my case, it usually depends on the customers and their SLA. Most of them usually do not have a "true" high SLA requirement thus prefer the jobs to be throttled when the actual cost is within a certain range of the budget instead of scaling indefinitely.&lt;/P&gt;&lt;P&gt;In an ideal world, solution 1 and 3 would be implemented. Option 3 would be configurable to optionally add throttling when required.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The throttling feature would be used to estimate the worst case.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jan 2022 13:52:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/singlenode-all-purpose-cluster-for-small-etls/m-p/32459#M23654</guid>
      <dc:creator>RicksDB</dc:creator>
      <dc:date>2022-01-04T13:52:06Z</dc:date>
    </item>
  </channel>
</rss>

