cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

ADF Pipeline - Notebook Run time

Vibhor
Contributor

In adf/pipeline can we specify to exit notebook and proceed to another notebook after some threshold value like 15 minutes. For example I have a pipeline with notebooks scheduled in sequence, want the pipeline to keep running that notebook for a certain period and then move to next one if previous doesnt complete in that specified time limit.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

When you say 'pipeline' I assume you mean a Data Factory Pipeline.

You can do this by setting the 'Timeout' value in the General-tab of the Databricks Notebook activity.

By default this is 7 days I think, you can set it to 15 minutes.

This will throw an error in ADF, so make sure you let the subsequent activities to be processed (the blue arrow or red one).

One thing I am not sure of is if the actual databricks run is cancelled too by ADF, or that only the ADF activity is cancelled (while in databricks the notebook keeps running). That is easy to test though.

View solution in original post

5 REPLIES 5

Prabakar
Databricks Employee
Databricks Employee

Hi @Vibhor Sethiโ€‹ , if it's a pipeline, then it has to follow the flow. Skipping a task and moving to the next is not available for now.

-werners-
Esteemed Contributor III

When you say 'pipeline' I assume you mean a Data Factory Pipeline.

You can do this by setting the 'Timeout' value in the General-tab of the Databricks Notebook activity.

By default this is 7 days I think, you can set it to 15 minutes.

This will throw an error in ADF, so make sure you let the subsequent activities to be processed (the blue arrow or red one).

One thing I am not sure of is if the actual databricks run is cancelled too by ADF, or that only the ADF activity is cancelled (while in databricks the notebook keeps running). That is easy to test though.

@Werner Stinckensโ€‹  - yes this approach worked, thanks

Hubert-Dudek
Esteemed Contributor III

Exactly as @Werner Stinckensโ€‹  said

+ additionally, I know it is not perfect architecture but when adf run notebook, than notebook can run another notebook with specified timeout:

dbutils.notebook.run(notebook, timeout)

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Vibhor Sethiโ€‹ ,

There is a global timeout in Azure Data Factory (ADF) that you can use to stop the pipeline. In addition, you can use the notebook timeout in case you want to control it from your Databricks job.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group