cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Catch when a notebook fails and terminate command in threaded parallel notebook run

cmilligan
Contributor II

I have a command that is running notebooks in parallel using threading. I want the command to fail whenever one of the notebooks that is running fails. Right now it is just continuing to run the command.

Below is the command line that I'm currently running:

q = Queue()
worker_count = 3 
 
def run_notebook(notebook):
  print(notebook)
  dbutils.notebook.run(notebook, 30, {"begin_date": begin_date, "end_date": end_date})
 
def run_tasks(function, q):
    while not q.empty():
      value = q.get()
      function(value)
      q.task_done()
 
for x in range(len(notebook_steps)): 
  for x in notebook_steps[x]:
    q.put(x)
      
  for i in range(worker_count):
    t=Thread(target=run_tasks, args=(run_notebook, q))
    t.daemon = False
    t.start()
 
  q.join()

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Coleman Milligan​,

You can run multiple Azure Databricks notebooks in parallel by using the dbutils library.

Here is a python code based on the sample code from the Azure Databricks documentation on running notebooks concurrently and on Notebook workflows with additional parameterization, retry logic and error handling.

Note that all child notebooks will share resources on the cluster, which can cause bottlenecks and failures in case of resource contention. It might be better to run parallel jobs on its dedicated clusters using the Jobs API.

View solution in original post

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @Coleman Milligan​,

You can run multiple Azure Databricks notebooks in parallel by using the dbutils library.

Here is a python code based on the sample code from the Azure Databricks documentation on running notebooks concurrently and on Notebook workflows with additional parameterization, retry logic and error handling.

Note that all child notebooks will share resources on the cluster, which can cause bottlenecks and failures in case of resource contention. It might be better to run parallel jobs on its dedicated clusters using the Jobs API.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.