Hubert-Dudek
Databricks MVP

@Pantelis Maroudis​ , yes as @Werner Stinckens​ said it is parallelism on driver which will send anyway as spark jobs in the queue to workers, and every CPU will work step by step on 1 partition at the same time... I used ThreadPool often in the past then I stopped as it is a bit nonsense in case when your code is correct (is designed to work on executors not on driver) 🙂

  • for every notebook reserve, some resources using separate pools spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool name")
  • you can just set them to run in parallel using jobs/tasks - one ***** task and all other tasks depended on that 1 task as on that image:image.png

My blog: https://databrickster.medium.com/