<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Workflows 7 second delay between tasks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/workflows-7-second-delay-between-tasks/m-p/55558#M30367</link>
    <description>&lt;P&gt;When you have a job in&amp;nbsp;Workflows with multiple tasks running after one another, there seems to be a consistent 7 seconds delay between execution of the tasks. Or, more precisely, every task has an approximate 7 second overhead before the code actually runs. Does anybody know why, or if there is some workaround&lt;/P&gt;&lt;P&gt;We've not tested this in every possible setup, but here's what we did:&lt;/P&gt;&lt;P&gt;Created a notebook with a single print statement [print("Hello world")]. This takes milliseconds to execute in the notebook itself. Created a job with 3 or more tasks, each running the same notebook. We ran the job using both job cluster and all purpose cluster with driver + 2 workers with 4 cores. When you run the job each task takes about 7 seconds to complete.&lt;/P&gt;&lt;P&gt;This delay might be&amp;nbsp;negligible on larger jobs, but we have some smaller jobs that need to run often. If we use workflow tasks, these delays will in some cases double the run time, which is unacceptable.&lt;/P&gt;</description>
    <pubDate>Wed, 20 Dec 2023 15:03:50 GMT</pubDate>
    <dc:creator>bergmaal</dc:creator>
    <dc:date>2023-12-20T15:03:50Z</dc:date>
    <item>
      <title>Workflows 7 second delay between tasks</title>
      <link>https://community.databricks.com/t5/data-engineering/workflows-7-second-delay-between-tasks/m-p/55558#M30367</link>
      <description>&lt;P&gt;When you have a job in&amp;nbsp;Workflows with multiple tasks running after one another, there seems to be a consistent 7 seconds delay between execution of the tasks. Or, more precisely, every task has an approximate 7 second overhead before the code actually runs. Does anybody know why, or if there is some workaround&lt;/P&gt;&lt;P&gt;We've not tested this in every possible setup, but here's what we did:&lt;/P&gt;&lt;P&gt;Created a notebook with a single print statement [print("Hello world")]. This takes milliseconds to execute in the notebook itself. Created a job with 3 or more tasks, each running the same notebook. We ran the job using both job cluster and all purpose cluster with driver + 2 workers with 4 cores. When you run the job each task takes about 7 seconds to complete.&lt;/P&gt;&lt;P&gt;This delay might be&amp;nbsp;negligible on larger jobs, but we have some smaller jobs that need to run often. If we use workflow tasks, these delays will in some cases double the run time, which is unacceptable.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Dec 2023 15:03:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/workflows-7-second-delay-between-tasks/m-p/55558#M30367</guid>
      <dc:creator>bergmaal</dc:creator>
      <dc:date>2023-12-20T15:03:50Z</dc:date>
    </item>
    <item>
      <title>Re: Workflows 7 second delay between tasks</title>
      <link>https://community.databricks.com/t5/data-engineering/workflows-7-second-delay-between-tasks/m-p/58440#M31140</link>
      <description>&lt;P&gt;Could you please share the Spark UI screenshots showing the delay of these task? You will need to pay attention to the driver's logs too.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jan 2024 17:38:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/workflows-7-second-delay-between-tasks/m-p/58440#M31140</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2024-01-25T17:38:06Z</dc:date>
    </item>
    <item>
      <title>Re: Workflows 7 second delay between tasks</title>
      <link>https://community.databricks.com/t5/data-engineering/workflows-7-second-delay-between-tasks/m-p/60314#M31636</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/54581"&gt;@bergmaal&lt;/a&gt;&amp;nbsp;, I am experiencing the same issue.&lt;BR /&gt;My Databricks consultant suggested opening a support ticket as this should not be normal behavior.&lt;/P&gt;&lt;P&gt;Did you solve this issue yet?&lt;/P&gt;&lt;P&gt;We observed these delays do not seem to occur in workflows that use notebooks in the "Workspace".&lt;/P&gt;&lt;P&gt;We observed the delays mainly if the tasks reference notebooks in GIT repositories by "branch" or "commit" (example in attached image).&lt;/P&gt;</description>
      <pubDate>Thu, 15 Feb 2024 14:01:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/workflows-7-second-delay-between-tasks/m-p/60314#M31636</guid>
      <dc:creator>JensH</dc:creator>
      <dc:date>2024-02-15T14:01:38Z</dc:date>
    </item>
  </channel>
</rss>

