cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Why Databricks spawns multiple jobs

pawelmitrus
New Contributor III

I have a Delta table spark101.airlines (sourced from `/databricks-datasets/airlines/`) partitioned by `Year`. My `spark.sql.shuffle.partitions` is set to default 200. I run a simple query:

select Origin, count(*) 
from spark101.airlines
group by Origin

Stage 1: Data is read into 17 partitions, which resembles my `spark.sql.files.maxPartitionBytes`. This stage also pre-aggregates the data within the scope of each executor and saves it into 200 partitions.

What I would expect:

Stage 2: It should spawn 200 tasks to read and aggregate partitions from the previous stage.

What I've god instead:

imageAll the other stages adds up to 200, but why there are separate jobs spawned?

4 REPLIES 4

-werners-
Esteemed Contributor III

jobs get spawned on actions.

So it seems you have multiple actions in your code.

Is the code snippet you posted the whole notebook?

I think it is something that Databricks does when running a query which result is returned to the notebook. When I write this sql statement to the storage, then it's only 1 job with 2 stages - as expected.

pawelmitrus
New Contributor III

Yeah, this is all I've got. The things I should also mention:

  • databricks runtime 10.4 LTS
  • I have disabled AQE

It looks like databricks have some kind of approach of creating jobs / stages in the way that:

  • start with 1
  • multiply by 4, if not enough then...
  • multiply by 5, if not enough then...
  • multiply by 5, if not enough then...
  • take the rest

so eventually it is (1, 4, 20, 100, 75) = 200

User16753725469
Contributor II

Could you please paste the query plan here to analyse the issue

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.