โ08-09-2024 09:28 AM
I have a job with > 10 tasks in it that interacts with an external system outside of databricks. At the moment that external system cannot handle more than 3 of the tasks executing concurrently. How can I limit the number of tasks that concurrently execute in a job? I'm not particularly worried about the order in which they execute, only that the number at any one time is limited to 3.
The cluster that I execute this on currently has only 1 worker in it and I'm looking to limit what takes place on that single worker.
โ12-06-2024 10:36 PM
To limit the number of tasks that concurrently execute in a job to 3, you can use the max_concurrent_runs parameter in your job configuration. This parameter allows you to specify the maximum number of concurrent runs for a job, ensuring that no more than the specified number of tasks run at the same time.
When creating or updating your job, set the max_concurrent_runs parameter to 3. This will limit the number of concurrent tasks to 3.
The max_concurrent_runs parameter will handle the concurrency limit regardless of the cluster size.
โ12-07-2024 12:32 AM
Hi @tgburrin-afs, @Mounika_Tarigop ,
As I understand the question is about running concurrent tasks within a single job rather than running concurrent jobs.
max_concurrent_runs controls how many times a whole job can run simultaneously, not the concurrency of tasks within a single job run.
There is currently no direct feature in Databricks Jobs to specify a maximum number of concurrently running tasks within a single job run. Instead, you need to control concurrency through task dependencies or application logic.
Approaches to Limit Concurrent Tasks Within a Single Job Run
Implement Concurrency Control in Your Code:
If each Databricks task itself runs code that can executes operations against the external system, implement concurrency control in your code logic.
For example, if a single task processes multiple items and you donโt want more than three operations to hit the external system concurrently, use a thread pool within the taskโs code:
from concurrent.futures import ThreadPoolExecutor
# Limit to 3 concurrent operations
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(process_item, item) for item in items_to_process]
results = [f.result() for f in futures]This approach requires merging multiple pieces of logic into a single task and controlling concurrency at the code level.
โ04-19-2025 12:39 AM
Same thing here; job concurrency is good but nothing for task; some jobs we do have countless parallel tasks so by not controlling it the downstream servers goes to a grinding halt and tasks just terminate.
It needs what we call a spinlock on tasks to control the concurrency in a global shared space. If you case allows for a python wrapper we can do something like this:
1. Create a volume;
2. Ensure each task has a unique key of sorts; use a parameter;
3. Do a spinlock on the volume with something like this:
import time
import random
#############
# Concurrency
#############
aat_vol="/Volumes/{your_volume_path}"
aat=1
def saat(s):
time.sleep(1+random.randrange(1,10))
ready=False
while ready==False:
list=dbutils.fs.ls(aat_vol)
if len(list)<aat:
dbutils.fs.mkdirs(aat_vol+"/"+s)
ready=True
break
else:
time.sleep(1+random.randrange(1,10))
def eaat(s):
dbutils.fs.rm(aat_vol+"/"+s)
...
try:
eaat(target)
saat(target)
...
except Exception as e:
eaat(target)
raise e
eaat(target)aat_vol; is the path to your volume
aat: is the number of concurrent tasks you want to allow
saat: starts a task with the given key
eaat: ends a task with a given
The random is used to ensure they do not spin up on the same time in some cases 1/10 probability you will still have more than one initial spins; always ensure you call eaat to clear the volume even if it fails.
โ07-22-2025 01:29 AM
I have a similar question, but I am more interested in it to understand whether resources are allocated in an intelligent way. I guess the compute configuration should have an effect on how many tasks can be run at the same time. But how does it exactly effect it?
My question is the following: suppose task A and B depends on task C and D. Other then this, there are 50 other tasks that do not have dependencies. Suppose that in the beginning, task C and D are started, and, if there is a limit of parallelism, 8 more tasks. Then task C is over, but D is running, so A and B cannot start yet. My question is: is the orchestration intelligent enough to start a new (9th) task from the 50 while waiting for D to finish? Or does it wait and waste resources in the meantime?
7 hours ago
Hi smoortema, it will wait as for resource waste the sleep is the least expensive thing you can do it's impact on costing you'd need to gauge but it should be minimal. If you want zero data bricks waste your best option is to wire the 50 tasks in some sequential pattern this will then lead to longer run times and likely the opposite: underutilizing the downstream servers and not meeting the sla which is also waste; so we wire the sequential tasks in slots of 3 maybe which gives you a good balance but then you get 50 more of them and have to re-wire the whole situation again; ideally you'd want to balance the concurrency to some degree: you don't want to run the 50 tasks concurrently as that will kill the downstream server (my case) but also don't want to run them 1 by 1 as you'd not be utilizing the downstream server properly and likely miss the delivery times.
It's worth noting that in the new release there's the ability to only start a task if a table changes for example which is a great step in the right direction but in most cases our pre-requisites for starting a specific task is a bit more complex than that.
6 hours ago
Thanks!
What do you mean by this sentence? What do slots mean in this context?
"we wire the sequential tasks in slots of 3 maybe which gives you a good balance but then you get 50 more of them and have to re-wire the whole situation again"
6 hours ago
You do something like:
E1 E4
E2 Z E5 Z ...
E3 E6
So Z does not actually do anything it's just a funnel that waits for the 3 tasks at a time to complete ...
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now