Databricks Community

DennisB · ‎07-24-2023

Hi everyone,

Hoping someone can help me with this problem. I have an embarrassingly parallel workload, which I'm parallelising over 4 worker nodes (of type Standard_F4, so 4 cores each). Each workload is single-threaded, so I believe that only one core is actually being utilised for each task. I'd like to ideally run 2+ tasks on each worker.

I've tried increasing the number of executors (having more than one per worker) by means of the following, but it doesn't seem to work.

spark.executor.cores 1
spark.executor.memory 2g
spark.executor.instances 16 // this is 4 workers * 4 cores = 16 executors

I've also tried dynamic allocation of executors, per the answer to this Stack Overflow thread, but that's also not working: java - How to set amount of Spark executors? - Stack Overflow.

Any help would be much appreciated. I can furnish more details if required.

DennisB · ‎07-26-2023

So I managed to get the 1-core-per-executor working successfully. The bit that wasn't working was spark.executor.memory -- this was too high, but lowering it so that the sum of the executors memory was ~90% of the worker node's memory allowed it to work properly.

View solution in original post

Tharun-Kumar · ‎07-24-2023

Hi @DennisB

Are you using threadpool to run workloads in parallel?

Also, I would suggest removing the three configs, you had mentioned. Now, if you create a 16 pool thread, then you would be seeing 16 runs happening on the executors front (16 cores being utilized).

DennisB · ‎07-26-2023

I haven't tried threadpool, but thanks for the suggestion. Before posting I had tried a multiprocessing Pool, but that didn't work (I'd hoped to use multiprocessing on each worker now, i.e., Spark to distribute to worker nodes, then multiprocessing to distribute to each core, but I couldn't get it to work -- I didn't think to try threadpool though).

Anonymous · ‎07-24-2023

Hi @DennisB

We haven't heard from you since the last response from @Tharun-Kumar , and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

DennisB · ‎07-26-2023

So I managed to get the 1-core-per-executor working successfully. The bit that wasn't working was spark.executor.memory -- this was too high, but lowering it so that the sum of the executors memory was ~90% of the worker node's memory allowed it to work properly.

Databricks Community

Better Worker Node Core Utilisation

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!