11-18-2021 02:28 AM
In adf/pipeline can we specify to exit notebook and proceed to another notebook after some threshold value like 15 minutes. For example I have a pipeline with notebooks scheduled in sequence, want the pipeline to keep running that notebook for a certain period and then move to next one if previous doesnt complete in that specified time limit.
11-18-2021 03:01 AM
When you say 'pipeline' I assume you mean a Data Factory Pipeline.
You can do this by setting the 'Timeout' value in the General-tab of the Databricks Notebook activity.
By default this is 7 days I think, you can set it to 15 minutes.
This will throw an error in ADF, so make sure you let the subsequent activities to be processed (the blue arrow or red one).
One thing I am not sure of is if the actual databricks run is cancelled too by ADF, or that only the ADF activity is cancelled (while in databricks the notebook keeps running). That is easy to test though.
11-18-2021 02:47 AM
Hi @Vibhor Sethi , if it's a pipeline, then it has to follow the flow. Skipping a task and moving to the next is not available for now.
11-18-2021 03:01 AM
When you say 'pipeline' I assume you mean a Data Factory Pipeline.
You can do this by setting the 'Timeout' value in the General-tab of the Databricks Notebook activity.
By default this is 7 days I think, you can set it to 15 minutes.
This will throw an error in ADF, so make sure you let the subsequent activities to be processed (the blue arrow or red one).
One thing I am not sure of is if the actual databricks run is cancelled too by ADF, or that only the ADF activity is cancelled (while in databricks the notebook keeps running). That is easy to test though.
11-22-2021 08:31 AM
@Werner Stinckens - yes this approach worked, thanks
11-18-2021 03:18 AM
Exactly as @Werner Stinckens said
+ additionally, I know it is not perfect architecture but when adf run notebook, than notebook can run another notebook with specified timeout:
dbutils.notebook.run(notebook, timeout)
11-18-2021 08:17 AM
Hi @Vibhor Sethi ,
There is a global timeout in Azure Data Factory (ADF) that you can use to stop the pipeline. In addition, you can use the notebook timeout in case you want to control it from your Databricks job.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group