Why Databricks spawns multiple jobs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-24-2022 05:31 AM
I have a Delta table spark101.airlines (sourced from `/databricks-datasets/airlines/`) partitioned by `Year`. My `spark.sql.shuffle.partitions` is set to default 200. I run a simple query:
select Origin, count(*)
from spark101.airlines
group by Origin
Stage 1: Data is read into 17 partitions, which resembles my `spark.sql.files.maxPartitionBytes`. This stage also pre-aggregates the data within the scope of each executor and saves it into 200 partitions.
What I would expect:
Stage 2: It should spawn 200 tasks to read and aggregate partitions from the previous stage.
What I've god instead:
All the other stages adds up to 200, but why there are separate jobs spawned?
- Labels:
-
Delta
-
JOBS
-
Multiple Jobs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2022 12:26 AM
jobs get spawned on actions.
So it seems you have multiple actions in your code.
Is the code snippet you posted the whole notebook?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-31-2022 06:05 AM
I think it is something that Databricks does when running a query which result is returned to the notebook. When I write this sql statement to the storage, then it's only 1 job with 2 stages - as expected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2022 10:09 AM
Yeah, this is all I've got. The things I should also mention:
- databricks runtime 10.4 LTS
- I have disabled AQE
It looks like databricks have some kind of approach of creating jobs / stages in the way that:
- start with 1
- multiply by 4, if not enough then...
- multiply by 5, if not enough then...
- multiply by 5, if not enough then...
- take the rest
so eventually it is (1, 4, 20, 100, 75) = 200
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-01-2022 12:01 AM
Could you please paste the query plan here to analyse the issue

