cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

ADF Pipeline - Notebook Run time

Vibhor
Contributor

In adf/pipeline can we specify to exit notebook and proceed to another notebook after some threshold value like 15 minutes. For example I have a pipeline with notebooks scheduled in sequence, want the pipeline to keep running that notebook for a certain period and then move to next one if previous doesnt complete in that specified time limit.

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

When you say 'pipeline' I assume you mean a Data Factory Pipeline.

You can do this by setting the 'Timeout' value in the General-tab of the Databricks Notebook activity.

By default this is 7 days I think, you can set it to 15 minutes.

This will throw an error in ADF, so make sure you let the subsequent activities to be processed (the blue arrow or red one).

One thing I am not sure of is if the actual databricks run is cancelled too by ADF, or that only the ADF activity is cancelled (while in databricks the notebook keeps running). That is easy to test though.

View solution in original post

5 REPLIES 5

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @Vibhor Sethi​ , if it's a pipeline, then it has to follow the flow. Skipping a task and moving to the next is not available for now.

-werners-
Esteemed Contributor III

When you say 'pipeline' I assume you mean a Data Factory Pipeline.

You can do this by setting the 'Timeout' value in the General-tab of the Databricks Notebook activity.

By default this is 7 days I think, you can set it to 15 minutes.

This will throw an error in ADF, so make sure you let the subsequent activities to be processed (the blue arrow or red one).

One thing I am not sure of is if the actual databricks run is cancelled too by ADF, or that only the ADF activity is cancelled (while in databricks the notebook keeps running). That is easy to test though.

@Werner Stinckens​  - yes this approach worked, thanks

Hubert-Dudek
Esteemed Contributor III

Exactly as @Werner Stinckens​  said

+ additionally, I know it is not perfect architecture but when adf run notebook, than notebook can run another notebook with specified timeout:

dbutils.notebook.run(notebook, timeout)

jose_gonzalez
Moderator
Moderator

Hi @Vibhor Sethi​ ,

There is a global timeout in Azure Data Factory (ADF) that you can use to stop the pipeline. In addition, you can use the notebook timeout in case you want to control it from your Databricks job.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.