cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ThreadPoolExecutor in Databricks

uzairm
New Contributor III

I am using a threadpool executor and running notebooks in parallel. However, these parallel notebooks are not using executors at all and all the load is going towards the driver node resulting in running out of memory for the driver node and eventually crashing.

The parallel notebooks are all same and involve creating huge pandas dataframes, spark dataframes, and appending them to delta tables. What am I missing? How do I redirect load to executor nodes?

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@uzair mustafa​ : Using a threadpool executor to parallelize the execution of notebooks may not be enough to distribute the load across your cluster. When you use threadpool executor, all threads are running on the same node, might run out of memory as well -> this is the desired result.

To tackle your problem, can you try running each notebook as a separate process and create a Spark Context within that process. Please try using "subprocess" module in Python to spawn a new process for each notebook.

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

@uzair mustafa​ : Using a threadpool executor to parallelize the execution of notebooks may not be enough to distribute the load across your cluster. When you use threadpool executor, all threads are running on the same node, might run out of memory as well -> this is the desired result.

To tackle your problem, can you try running each notebook as a separate process and create a Spark Context within that process. Please try using "subprocess" module in Python to spawn a new process for each notebook.

Anonymous
Not applicable

Hi @uzair mustafa​ 

Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.

Please help us select the best solution by clicking on "Select As Best" if it does.

Your feedback will help us ensure that we are providing the best possible service to you.

Thank you!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!