cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Can python futures utilise all cluster nodes ?

Alex0101
New Contributor II

I used python futures to call a function multiple times concurrently, however I am not sure if all nodes is utilised or how to make sure it use all cluster nodes.

Can you confirm if I create a cluster with 5 works each with 8 memory cores for example. does that mean I can run 5 x 8 concurrent tasks ?

or the futures and python will use only the main node for each tasks ?

Code example:

# assuming 5 workers each with 8 cores
executor = ProcessPoolExecutor(5 * 8)
 
def tester():
   # code to run any parallel task
   return result
 
for index in range(10000):
   executor.submit(tester)

In other words, can the python futures or any python threading library use all cluster workers cpus ?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

it will run on the main node (driver) only.

You need some kind of cluster management framework to distribute the work over node, like Yarn, Spark Dask, Ray etc

If you would use pyspark, then you can leverage the parallel processing of Spark and it would indeed run over multiple nodes, if your function uses Spark.

View solution in original post

3 REPLIES 3

-werners-
Esteemed Contributor III

it will run on the main node (driver) only.

You need some kind of cluster management framework to distribute the work over node, like Yarn, Spark Dask, Ray etc

If you would use pyspark, then you can leverage the parallel processing of Spark and it would indeed run over multiple nodes, if your function uses Spark.

Keyuri
New Contributor II

You can create a init script and then add it during cluster start up ​

Alex0101
New Contributor II

Can you elaborate, which init script to add to the cluster ?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group