06-16-2022 11:06 PM
Per the documentation, a workspace is limited to 1k concurrent job runs.
Can somebody clarify how the concurrency limit is set i.e:
Also, is there any way to increase this limit?
If so what is the hard limit and how do we increase the max concurrency limit
06-17-2022 01:58 AM
I think it is the first one.
Increasing that does not seem to be possible.
That is how I interpret the docs anyway.
06-17-2022 01:58 AM
I think it is the first one.
Increasing that does not seem to be possible.
That is how I interpret the docs anyway.
06-17-2022 02:21 AM
Yeah, The document is worded like the first option
But the API doc has max concurrency for a single job as 1000 too. While this also fits the first option, It could also mean the second option
Some confirmation would be good.
And 1000 concurrent job runs per workspace seems less
06-17-2022 02:26 AM
well, technically 1000 runs of the same job is the same as 1 run of 1000 different jobs.
But I agree the docs could be more clear on that.
1000 concurrent jobs does not seem low to me. It would mean you run 1000 spark jobs simultaneously, which is quite a lot. If each job only uses 4 vCores (probably it is more), we are already at 4000 vCores. If those 1000 jobs also start writing simultaneously to your data lake or database, you will have performance issues.
06-17-2022 02:39 AM
While I do agree with your statement, I am not clear what cost data bricks would incur for allowing more than 1k concurrent job runs.
We have some use cases where it would be best to have a job per customer in this case we can easily cross 1k concurrent runs as we scale. Similarly, all the jobs could be writing to different tables.
I don't think we should assume that all the jobs would be writing to the same table and even if they are writing to the same table we can handle the concurrent writes by properly partitioning the table and using the partition keys in the merge queries.
06-17-2022 02:44 AM
every storage system has its limits concerning throughput. It might be the disks or the nics.
Also when you talk about scaling: the power of spark lies not in the capability to run a ton of jobs in parallel, but by dividing a workload into tasks and process those tasks in parallel on multiple workers.
06-17-2022 03:06 AM
I am not that clear on why concurrency will be affected by the fs, Seems strange, since we can also have the same amount of writes to fs with less than 1k concurrent jobs(By simply increasing the number of workers nodes or cores). So if the concurrency limit is due to fs limitation it should be different based on worker node configuration.
I understand that spark is meant to work with a large amount of data split across workers. Sorry, I might not have been clear on our use case. We actually have use cases where the task to be performed can vary with each customer.
Our pipeline will format customer data into a single unified format. After this stage, we can process the entire data with a single job. But to get to this stage we need to process raw data from each customer differently.
06-17-2022 04:05 AM
What I mean is that cloud storage has its limits in what it can process all at once.
Apparently in your case it is not (yet) an issue if you execute the writes at the same time.
Is it an option to process certain jobs sequentially? Or by grouping customers with the same transformations?
Another workspace could also be a (less optimal) solution, or talking to your Databricks contact.
Thinking about your use case, I would try to build some kind of framework which enables you to manage the processing more dynamically.
Easy said, I know 🙂 But nnow every new customer is a new spark script. That is a pain to manage.
06-17-2022 04:35 AM
Ok, Got it
Regarding doing the jobs sequentially. yeah, that is an option we are considering, but as a last resort. Grouping could also be a possible solution but needs work on properly defining the transformations
Yeah, We are looking for ways to create a framework to do this transformation. We were considering dbx, or our own framework but this will be really hard to finish as you mentioned.
Anyways, Thanks a lot for your views on this.
06-22-2022 03:15 AM
Hi @Kaniz Fatma
Weners gave some great suggestions on the issue we are dealing with. But some confirmation on the question from data bricks side would be much appreciated
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group