Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-07-2022 04:24 AM
@Pantelis Maroudis , yes as @Werner Stinckens said it is parallelism on driver which will send anyway as spark jobs in the queue to workers, and every CPU will work step by step on 1 partition at the same time... I used ThreadPool often in the past then I stopped as it is a bit nonsense in case when your code is correct (is designed to work on executors not on driver) 🙂
- for every notebook reserve, some resources using separate pools spark.sparkContext.setLocalProperty("spark.scheduler.pool", "pool name")
- you can just set them to run in parallel using jobs/tasks - one ***** task and all other tasks depended on that 1 task as on that image:
My blog: https://databrickster.medium.com/