cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to %run a list of notebooks in Databricks

Philblakeman
New Contributor III

I'd like to %run a list of notebooks from another Databricks notebook.

my_notebooks = ["./setup", "./do_the_main_thing", "./check_results"]
for notebook in my_notebooks:
   %run notebook

This doesn't work ofcourse. I don't want to use dbutils.notebook.run() as this creates new jobs and doesn't return anything back - I want everything executable and queryable from the main notebook.

I thought perhaps it might be possible to import the actual module and run the function.

?%run shows the command points to IPython/core/magics/execution.py and run is a method of the class ExecutionMagics in the module execution.

So perhaps, I could use execution.ExecutionMagic.run() if I created an instance of the class.

But it's beyond me - tricky and I'm doubting it's an effective solution.

How can this be done?

Am I really stuck with:-

%run ./a notebook
 
%run ./another_notebook
 
%run ./yet_another_hardcoded_notebook_name

Eternally grateful for any help!

4 REPLIES 4

UmaMahesh1
Honored Contributor III

Hi @Philip Blakemanโ€‹ 

You can do this using scala or python constructs using threads and futures.

You can download the notebook archive from this link.

https://docs.databricks.com/notebooks/notebook-workflows.html#run-multiple-notebooks-concurrently

After that, based on your preference, set the number of parallel notebooks to be run using numNotebooksInParallel variable. If you just want 1 notebook at a time, you can do that too just by removing the unnecessary parts.

Be careful not to crash your driver by providing too many parallel notebooks.

Hope this helps.. Cheers.

UmaMahesh1
Honored Contributor III

Hello @Philip Blakemanโ€‹ 

Checking if your issue got resolved or if you are facing any issues with the above approach..

Cheers..

Philblakeman
New Contributor III

Hi

Thanks for the answer but it wasn't what I was after. It was about being able to provide a list as a variable.

I'll run the notebooks in series.

Basically can you %run {mynotebook}?

I think the answer is no.

https://stackoverflow.com/questions/74518979/how-to-run-a-list-of-notebooks-in-databricks

Ajay-Pandey
Esteemed Contributor III

Please refer below code

import scala.concurrent.{Future, Await}
import scala.concurrent.duration._
import scala.util.control.NonFatal
 
case class NotebookData(path: String, timeout: Int, parameters: Map[String, String] = Map.empty[String, String])
 
def parallelNotebooks(notebooks: Seq[NotebookData]): Future[Seq[String]] = {
  import scala.concurrent.{Future, blocking, Await}
  import java.util.concurrent.Executors
  import scala.concurrent.ExecutionContext
  import com.databricks.WorkflowException
 
  val numNotebooksInParallel = 4 
  // If you create too many notebooks in parallel the driver may crash when you submit all of the jobs at once. 
  // This code limits the number of parallel notebooks.
  implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(numNotebooksInParallel))
  val ctx = dbutils.notebook.getContext()
  
  Future.sequence(
    notebooks.map { notebook => 
      Future {
        dbutils.notebook.setContext(ctx)
        if (notebook.parameters.nonEmpty)
          dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)
        else
          dbutils.notebook.run(notebook.path, notebook.timeout)
      }
      .recover {
        case NonFatal(e) => s"ERROR: ${e.getMessage}"
      }
    }
  )
}
 
def parallelNotebook(notebook: NotebookData): Future[String] = {
  import scala.concurrent.{Future, blocking, Await}
  import java.util.concurrent.Executors
  import scala.concurrent.ExecutionContext.Implicits.global
  import com.databricks.WorkflowException
 
  val ctx = dbutils.notebook.getContext()
  // The simplest interface we can have but doesn't
  // have protection for submitting to many notebooks in parallel at once
  Future {
    dbutils.notebook.setContext(ctx)
    
    if (notebook.parameters.nonEmpty)
      dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)
    else
      dbutils.notebook.run(notebook.path, notebook.timeout)
    
  }
  .recover {
    case NonFatal(e) => s"ERROR: ${e.getMessage}"
  }
}

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.