Re: Slow imports for concurrent notebooks

Hubert-Dudek · ‎05-07-2022

@Pantelis Maroudis , yes as @Werner Stinckens said it is parallelism on driver which will send anyway as spark jobs in the queue to workers, and every CPU will work step by step on 1 partition at the same time... I used ThreadPool often in the past then I stopped as it is a bit nonsense in case when your code is correct (is designed to work on executors not on driver) 🙂

for every notebook reserve, some resources using separate pools spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool name")
you can just set them to run in parallel using jobs/tasks - one ***** task and all other tasks depended on that 1 task as on that image:

My blog: https://databrickster.medium.com/