cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Workflows 7 second delay between tasks

bergmaal
New Contributor III

When you have a job in Workflows with multiple tasks running after one another, there seems to be a consistent 7 seconds delay between execution of the tasks. Or, more precisely, every task has an approximate 7 second overhead before the code actually runs. Does anybody know why, or if there is some workaround

We've not tested this in every possible setup, but here's what we did:

Created a notebook with a single print statement [print("Hello world")]. This takes milliseconds to execute in the notebook itself. Created a job with 3 or more tasks, each running the same notebook. We ran the job using both job cluster and all purpose cluster with driver + 2 workers with 4 cores. When you run the job each task takes about 7 seconds to complete.

This delay might be negligible on larger jobs, but we have some smaller jobs that need to run often. If we use workflow tasks, these delays will in some cases double the run time, which is unacceptable.

2 REPLIES 2

jose_gonzalez
Moderator
Moderator

Could you please share the Spark UI screenshots showing the delay of these task? You will need to pay attention to the driver's logs too.

JensH
New Contributor III

Hi @bergmaal , I am experiencing the same issue.
My Databricks consultant suggested opening a support ticket as this should not be normal behavior.

Did you solve this issue yet?

We observed these delays do not seem to occur in workflows that use notebooks in the "Workspace".

We observed the delays mainly if the tasks reference notebooks in GIT repositories by "branch" or "commit" (example in attached image).