- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2022 12:30 PM
I have a custom application/executable that I upload to DBFS and transfer to my cluster's local storage for execution. I want to call multiple instances of this application in parallel, which I've only been able to successfully do with Python's subprocess.Popen(). However, doing it this way doesn't take advantage of autoscaling.
As a quick code example of what I'm trying to do:
ListOfCustomArguments = ["/path/to/config1.txt", "/path/to/config2.txt"] # Hundreds of custom configurations here
processes = []
for arg in ListOfCustomArguments :
command = "/path/to/executable " + arg
processes.append(subprocess.Popen(command, shell=True))
for p in processes:
p.wait()
print("Done!")
As is, this will not auto-scale. Any ideas?
- Labels:
-
Cluster Autoscaling
-
Python
-
Use
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2022 04:18 PM
Autoscaling works for spark jobs only. It works by monitoring the job queue, which python code won't go into. If it's just python code, try single node.
https://docs.databricks.com/clusters/configure.html#cluster-size-and-autoscaling

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2022 04:18 PM
Autoscaling works for spark jobs only. It works by monitoring the job queue, which python code won't go into. If it's just python code, try single node.
https://docs.databricks.com/clusters/configure.html#cluster-size-and-autoscaling
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2022 06:23 AM
Nice response @Joseph Kambourakis

