Databricks Community

Andolina · ‎10-25-2024

Hello All,

I am trying to fetch data from different sources for tables driven by a metadata table. Data will get fetched from sources using jdbc connector for each table mentioned in the metadata table. A scheduled job is responsible for fetching the data for each table. Now with a huge number of new tables, I want to achieve a faster and effective way of data ingestion using parallel processing. I tried using the Maximum concurrent runs in workflow and I was expecting 6 parallel runs to happen if I put concurrent runs=6. But it shows only one run. Does this happen at executor level? What is the expected outcome of this option Max concurrent run?

elguitar · ‎10-30-2024

Soo.. You use a loop to go through metadata table and then retrieve and ingest files using JDBC?

If so, then the concurrent runs won't be helpful. Concurrent runs means the number of how many runs of that job can be ran side by side. For you, this would probably mean that you would be ingesting the same data 6 times, if you were to run the job 6 times.

If you want to retrieve and ingest those tables concurrently, you can either:

Separate individual table processing to different tasks of the job. If the tasks don't depend on each other, they are ran concurrently.
Use the language-specific concurrency methods. I don't know how your code looks now, so I cannot say more about this option.

If it's easy for you to describe the process as a DAG (directed acyclic graph), I'd say that utilizing Databricks' tasks is pretty straight forward. You could also try out https://docs.databricks.com/en/jobs/for-each.html, but I'm not sure how the concurrency works with that one.

View solution in original post

AngadSingh · ‎10-26-2024

Hi,

It seems the run is getting queued. It might be due to following settings (except the 3rd):

Andolina · ‎10-29-2024

Hi Angad,

No, the runs are not getting queued. As this property is a job level, I was expecting it to run concurrently or get queued, but we can only see 1 run of the workflow always even if concurrent runs is set to 6.

Edthehead · ‎10-29-2024

The Maximum concurrent runs parameter allows multiple runs of the same workflow to be executed in parallel. Since you've switched the queue parameter on, anything higher than 6 will be queued. This is only valid if the same workflow is triggered multiple times.
We can help you better if you provide more details on your workflow setup, how it is triggered. If it 1 workflow or multiple workflows.
You've mentioned that only 1 workflow is running. And you've also mentioned there is a scheduled job for each table. Is it the same job/workflow for all tables or different ones for each? Since you have scheduled your job at a certain time, how is it getting triggered multiple times?
If you've scheduled multiple jobs all using the same notebook and different parameters, the Maximum concurrent runs parameter will not help you.

elguitar · ‎10-30-2024

Soo.. You use a loop to go through metadata table and then retrieve and ingest files using JDBC?

If so, then the concurrent runs won't be helpful. Concurrent runs means the number of how many runs of that job can be ran side by side. For you, this would probably mean that you would be ingesting the same data 6 times, if you were to run the job 6 times.

If you want to retrieve and ingest those tables concurrently, you can either:

Separate individual table processing to different tasks of the job. If the tasks don't depend on each other, they are ran concurrently.
Use the language-specific concurrency methods. I don't know how your code looks now, so I cannot say more about this option.

If it's easy for you to describe the process as a DAG (directed acyclic graph), I'd say that utilizing Databricks' tasks is pretty straight forward. You could also try out https://docs.databricks.com/en/jobs/for-each.html, but I'm not sure how the concurrency works with that one.

Databricks Community

Workflow concurrent runs not working as expected

Join Us as a Local Community Builder!

🎬 Databricks Community 2025 Highlights | A Year, Built Together

🌟 Community Pulse: Your Weekly Roundup! December 22, 2025 – January 04, 2026

Solution Accelerator Series | Scale cybersecurity analytics with Splunk and Databricks

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Self-Paced Learning Festival: 09 January - 30 January 2026