<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How can I use cluster autoscaling with intensive subprocess calls? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-can-i-use-cluster-autoscaling-with-intensive-subprocess/m-p/17733#M11696</link>
    <description>&lt;P&gt;I have a custom application/executable that I upload to DBFS and transfer to my cluster's local storage for execution. I want to call multiple instances of this application in parallel, which I've only been able to successfully do with Python's subprocess.Popen(). However, doing it this way doesn't take advantage of autoscaling.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As a quick code example of what I'm trying to do:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;ListOfCustomArguments = ["/path/to/config1.txt", "/path/to/config2.txt"] # Hundreds of custom configurations here
&amp;nbsp;
processes = []
for arg in ListOfCustomArguments :
   command = "/path/to/executable " + arg
   processes.append(subprocess.Popen(command, shell=True))
&amp;nbsp;
for p in processes:
   p.wait()
&amp;nbsp;
print("Done!")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;As is, this will not auto-scale. Any ideas?&lt;/P&gt;</description>
    <pubDate>Thu, 08 Dec 2022 20:30:22 GMT</pubDate>
    <dc:creator>KellenO</dc:creator>
    <dc:date>2022-12-08T20:30:22Z</dc:date>
    <item>
      <title>How can I use cluster autoscaling with intensive subprocess calls?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-use-cluster-autoscaling-with-intensive-subprocess/m-p/17733#M11696</link>
      <description>&lt;P&gt;I have a custom application/executable that I upload to DBFS and transfer to my cluster's local storage for execution. I want to call multiple instances of this application in parallel, which I've only been able to successfully do with Python's subprocess.Popen(). However, doing it this way doesn't take advantage of autoscaling.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As a quick code example of what I'm trying to do:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;ListOfCustomArguments = ["/path/to/config1.txt", "/path/to/config2.txt"] # Hundreds of custom configurations here
&amp;nbsp;
processes = []
for arg in ListOfCustomArguments :
   command = "/path/to/executable " + arg
   processes.append(subprocess.Popen(command, shell=True))
&amp;nbsp;
for p in processes:
   p.wait()
&amp;nbsp;
print("Done!")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;As is, this will not auto-scale. Any ideas?&lt;/P&gt;</description>
      <pubDate>Thu, 08 Dec 2022 20:30:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-use-cluster-autoscaling-with-intensive-subprocess/m-p/17733#M11696</guid>
      <dc:creator>KellenO</dc:creator>
      <dc:date>2022-12-08T20:30:22Z</dc:date>
    </item>
    <item>
      <title>Re: How can I use cluster autoscaling with intensive subprocess calls?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-use-cluster-autoscaling-with-intensive-subprocess/m-p/17734#M11697</link>
      <description>&lt;P&gt;Autoscaling works for spark jobs only.  It works by monitoring the job queue, which python code won't go into.  If it's just python code, try single node.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/clusters/configure.html#cluster-size-and-autoscaling" target="test_blank"&gt;https://docs.databricks.com/clusters/configure.html#cluster-size-and-autoscaling&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Dec 2022 00:18:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-use-cluster-autoscaling-with-intensive-subprocess/m-p/17734#M11697</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-12-09T00:18:17Z</dc:date>
    </item>
    <item>
      <title>Re: How can I use cluster autoscaling with intensive subprocess calls?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-i-use-cluster-autoscaling-with-intensive-subprocess/m-p/17735#M11698</link>
      <description>&lt;P&gt;Nice response @Joseph Kambourakis​&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Dec 2022 14:23:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-i-use-cluster-autoscaling-with-intensive-subprocess/m-p/17735#M11698</guid>
      <dc:creator>tunstila</dc:creator>
      <dc:date>2022-12-09T14:23:13Z</dc:date>
    </item>
  </channel>
</rss>

