cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Max job concurrency per workspace

hari
Contributor

Per the documentation, a workspace is limited to 1k concurrent job runs.

Can somebody clarify how the concurrency limit is set i.e:

  • Is it 1k concurrent runs across all jobs in the workspace
  • Is it 1k concurrent runs for a single job

Also, is there any way to increase this limit?

If so what is the hard limit and how do we increase the max concurrency limit

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

I think it is the first one.

Increasing that does not seem to be possible.

That is how I interpret the docs anyway.

View solution in original post

9 REPLIES 9

-werners-
Esteemed Contributor III

I think it is the first one.

Increasing that does not seem to be possible.

That is how I interpret the docs anyway.

hari
Contributor

Yeah, The document is worded like the first option

But the API doc has max concurrency for a single job as 1000 too. While this also fits the first option, It could also mean the second option

Some confirmation would be good.

And 1000 concurrent job runs per workspace seems less

-werners-
Esteemed Contributor III

well, technically 1000 runs of the same job is the same as 1 run of 1000 different jobs.

But I agree the docs could be more clear on that.

1000 concurrent jobs does not seem low to me. It would mean you run 1000 spark jobs simultaneously, which is quite a lot. If each job only uses 4 vCores (probably it is more), we are already at 4000 vCores. If those 1000 jobs also start writing simultaneously to your data lake or database, you will have performance issues.

hari
Contributor

While I do agree with your statement, I am not clear what cost data bricks would incur for allowing more than 1k concurrent job runs.

We have some use cases where it would be best to have a job per customer in this case we can easily cross 1k concurrent runs as we scale. Similarly, all the jobs could be writing to different tables.

I don't think we should assume that all the jobs would be writing to the same table and even if they are writing to the same table we can handle the concurrent writes by properly partitioning the table and using the partition keys in the merge queries.

-werners-
Esteemed Contributor III

every storage system has its limits concerning throughput. It might be the disks or the nics.

Also when you talk about scaling: the power of spark lies not in the capability to run a ton of jobs in parallel, but by dividing a workload into tasks and process those tasks in parallel on multiple workers.

hari
Contributor

I am not that clear on why concurrency will be affected by the fs, Seems strange, since we can also have the same amount of writes to fs with less than 1k concurrent jobs(By simply increasing the number of workers nodes or cores). So if the concurrency limit is due to fs limitation it should be different based on worker node configuration.

I understand that spark is meant to work with a large amount of data split across workers. Sorry, I might not have been clear on our use case. We actually have use cases where the task to be performed can vary with each customer.

Our pipeline will format customer data into a single unified format. After this stage, we can process the entire data with a single job. But to get to this stage we need to process raw data from each customer differently.

-werners-
Esteemed Contributor III

What I mean is that cloud storage has its limits in what it can process all at once.

Apparently in your case it is not (yet) an issue if you execute the writes at the same time.

Is it an option to process certain jobs sequentially? Or by grouping customers with the same transformations?

Another workspace could also be a (less optimal) solution, or talking to your Databricks contact.

Thinking about your use case, I would try to build some kind of framework which enables you to manage the processing more dynamically.

Easy said, I know ๐Ÿ™‚ But nnow every new customer is a new spark script. That is a pain to manage.

hari
Contributor

Ok, Got it

Regarding doing the jobs sequentially. yeah, that is an option we are considering, but as a last resort. Grouping could also be a possible solution but needs work on properly defining the transformations

Yeah, We are looking for ways to create a framework to do this transformation. We were considering dbx, or our own framework but this will be really hard to finish as you mentioned.

Anyways, Thanks a lot for your views on this.

hari
Contributor

Hi @Kaniz Fatmaโ€‹ 

Weners gave some great suggestions on the issue we are dealing with. But some confirmation on the question from data bricks side would be much appreciated

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group