cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Execution Context scheduling

tariq
New Contributor III
Databricks allows a cluster to have a maximum of 150 execution context. If a number of execution context is attached and running operations on a databricks cluster with a driver and n workers, then how is the execution scheduled? I am assuming that only the driver cores can actually run the main execution context with any spark job being offloaded to workers. How does the scheduling for different execution contexts work? https://kb.databricks.com/notebooks/too-many-execution-contexts-are-open-right-now
3 REPLIES 3

-werners-
Esteemed Contributor III

I am not sure what your question exactly is.
The limit of 150 execution contexts is per cluster.  If you schedule that many spark programs you can easily avoid this limit by using job clusters.

Can you explain a bit more what you want to know?
f.e. the innards of the databricks resource scheduler in case 2 spark programs run at the same time, or how spark distributes work or ...?

tariq
New Contributor III

My question is if the number of execution contexts is more than the cores available in the driver of the cluster, then how does Databricks schedules the execution since one execution context will require at least one core to run on. Is it round-robin or something else. To make things simple let's assume no spark code is getting executed so there won't be any offloading to workers.

-werners-
Esteemed Contributor III

Ok i get it.
An execution context is not bound to cpu.  It is like a session.  So basically the limit of 150 execution contexts mean that 150 sessions/spark programs can run simultaneously on the cluster (whether that is possible on the hardware is another question).
Knowing that, your question is in fact:
if the number of spark tasks is more than the cores available in the driver...
First: the driver does only orchestration, it does not run spark tasks (it does run native python code though, and if you call collect() etc).  The workers execute tasks.
If there are more tasks than there are cpus available over all workers, then either extra nodes are created (if you use autoscale), or the tasks wait until resources become available.
This happens a lot actually, especially on tables with lots of partitions (which often exceed the number of cores).
Spark can handle this without any issue.
Timeouts can occur however.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now