<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Workflow concurrent runs not working as expected in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96759#M39343</link>
    <description>&lt;P&gt;Soo.. You use a loop to go through metadata table and then retrieve and ingest files using JDBC?&lt;/P&gt;&lt;P&gt;If so, then the&amp;nbsp;&lt;EM&gt;concurrent runs&lt;/EM&gt; won't be helpful.&amp;nbsp;&lt;EM&gt;Concurrent runs&lt;/EM&gt; means the number of how many runs of that job can be ran side by side. For you, this would probably mean that you would be ingesting the same data 6 times, if you were to run the job 6 times.&lt;/P&gt;&lt;P&gt;If you want to retrieve and ingest those tables concurrently, you can either:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Separate individual table processing to different tasks of the job. If the tasks don't depend on each other, they are ran concurrently.&lt;/LI&gt;&lt;LI&gt;Use the language-specific concurrency methods. I don't know how your code looks now, so I cannot say more about this option.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;If it's easy for you to describe the process as a DAG (directed acyclic graph), I'd say that utilizing Databricks' tasks is pretty straight forward. You could also try out&amp;nbsp;&lt;A href="https://docs.databricks.com/en/jobs/for-each.html," target="_blank"&gt;https://docs.databricks.com/en/jobs/for-each.html,&lt;/A&gt;&amp;nbsp;but I'm not sure how the concurrency works with that one.&lt;/P&gt;</description>
    <pubDate>Wed, 30 Oct 2024 07:15:43 GMT</pubDate>
    <dc:creator>elguitar</dc:creator>
    <dc:date>2024-10-30T07:15:43Z</dc:date>
    <item>
      <title>Workflow concurrent runs not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96166#M39235</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;&lt;P&gt;I am trying to fetch data from different sources for tables driven by a metadata table. Data will get fetched from sources using jdbc connector for each table mentioned in the metadata table. A scheduled job is responsible for fetching the data for each table. Now with a huge number of new tables, I want to achieve a faster and effective way of data ingestion using parallel processing. I tried using the Maximum concurrent runs in workflow and I was expecting 6 parallel runs to happen if I put concurrent runs=6. But it shows only one run.&amp;nbsp; Does this happen at executor level? What is the expected outcome of this option Max concurrent run?&lt;/P&gt;</description>
      <pubDate>Fri, 25 Oct 2024 15:41:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96166#M39235</guid>
      <dc:creator>Andolina</dc:creator>
      <dc:date>2024-10-25T15:41:39Z</dc:date>
    </item>
    <item>
      <title>Re: Workflow concurrent runs not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96291#M39255</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;It seems the run is getting queued. It might be due to following settings (except the 3rd):&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="1000037335.png" style="width: 1440px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12320iEEB8780FFA9B083A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="1000037335.png" alt="1000037335.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;</description>
      <pubDate>Sat, 26 Oct 2024 21:02:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96291#M39255</guid>
      <dc:creator>AngadSingh</dc:creator>
      <dc:date>2024-10-26T21:02:56Z</dc:date>
    </item>
    <item>
      <title>Re: Workflow concurrent runs not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96691#M39325</link>
      <description>&lt;P&gt;Hi Angad,&lt;/P&gt;&lt;P&gt;No, the runs are not getting queued. As this property is a job level, I was expecting it to run concurrently or get queued, but we can only see 1 run of the workflow always even if concurrent runs is set to 6.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Oct 2024 17:51:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96691#M39325</guid>
      <dc:creator>Andolina</dc:creator>
      <dc:date>2024-10-29T17:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: Workflow concurrent runs not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96746#M39335</link>
      <description>&lt;P&gt;The Maximum concurrent runs parameter allows multiple runs of the same workflow to be executed in parallel. Since you've switched the queue parameter on, anything higher than 6 will be queued. This is only valid if the same workflow is triggered multiple times.&amp;nbsp;&lt;BR /&gt;We can help you better if you provide more details on your workflow setup, how it is triggered. If it 1 workflow or multiple workflows.&lt;BR /&gt;You've mentioned that only 1 workflow is running. And you've also mentioned there is a scheduled job for each table. Is it the same job/workflow for all tables or different ones for each? Since you have scheduled your job at a certain time, how is it getting triggered multiple times?&lt;BR /&gt;If you've scheduled multiple jobs all using the same notebook and different parameters, the Maximum concurrent runs parameter will not help you.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Oct 2024 05:16:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96746#M39335</guid>
      <dc:creator>Edthehead</dc:creator>
      <dc:date>2024-10-30T05:16:10Z</dc:date>
    </item>
    <item>
      <title>Re: Workflow concurrent runs not working as expected</title>
      <link>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96759#M39343</link>
      <description>&lt;P&gt;Soo.. You use a loop to go through metadata table and then retrieve and ingest files using JDBC?&lt;/P&gt;&lt;P&gt;If so, then the&amp;nbsp;&lt;EM&gt;concurrent runs&lt;/EM&gt; won't be helpful.&amp;nbsp;&lt;EM&gt;Concurrent runs&lt;/EM&gt; means the number of how many runs of that job can be ran side by side. For you, this would probably mean that you would be ingesting the same data 6 times, if you were to run the job 6 times.&lt;/P&gt;&lt;P&gt;If you want to retrieve and ingest those tables concurrently, you can either:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Separate individual table processing to different tasks of the job. If the tasks don't depend on each other, they are ran concurrently.&lt;/LI&gt;&lt;LI&gt;Use the language-specific concurrency methods. I don't know how your code looks now, so I cannot say more about this option.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;If it's easy for you to describe the process as a DAG (directed acyclic graph), I'd say that utilizing Databricks' tasks is pretty straight forward. You could also try out&amp;nbsp;&lt;A href="https://docs.databricks.com/en/jobs/for-each.html," target="_blank"&gt;https://docs.databricks.com/en/jobs/for-each.html,&lt;/A&gt;&amp;nbsp;but I'm not sure how the concurrency works with that one.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Oct 2024 07:15:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/workflow-concurrent-runs-not-working-as-expected/m-p/96759#M39343</guid>
      <dc:creator>elguitar</dc:creator>
      <dc:date>2024-10-30T07:15:43Z</dc:date>
    </item>
  </channel>
</rss>

