- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2022 02:51 AM
I used python futures to call a function multiple times concurrently, however I am not sure if all nodes is utilised or how to make sure it use all cluster nodes.
Can you confirm if I create a cluster with 5 works each with 8 memory cores for example. does that mean I can run 5 x 8 concurrent tasks ?
or the futures and python will use only the main node for each tasks ?
Code example:
# assuming 5 workers each with 8 cores
executor = ProcessPoolExecutor(5 * 8)
def tester():
# code to run any parallel task
return result
for index in range(10000):
executor.submit(tester)
In other words, can the python futures or any python threading library use all cluster workers cpus ?
- Labels:
-
Compute
-
Databricks Cluster
-
Python
-
Threading
-
Workers
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2022 03:36 AM
it will run on the main node (driver) only.
You need some kind of cluster management framework to distribute the work over node, like Yarn, Spark Dask, Ray etc
If you would use pyspark, then you can leverage the parallel processing of Spark and it would indeed run over multiple nodes, if your function uses Spark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2022 03:36 AM
it will run on the main node (driver) only.
You need some kind of cluster management framework to distribute the work over node, like Yarn, Spark Dask, Ray etc
If you would use pyspark, then you can leverage the parallel processing of Spark and it would indeed run over multiple nodes, if your function uses Spark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2022 09:44 AM
You can create a init script and then add it during cluster start up
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-29-2022 10:52 AM
Can you elaborate, which init script to add to the cluster ?