cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to %run a list of notebooks in Databricks

Philblakeman
New Contributor III

I'd like to %run a list of notebooks from another Databricks notebook.

my_notebooks = ["./setup", "./do_the_main_thing", "./check_results"]
for notebook in my_notebooks:
   %run notebook

This doesn't work ofcourse. I don't want to use dbutils.notebook.run() as this creates new jobs and doesn't return anything back - I want everything executable and queryable from the main notebook.

I thought perhaps it might be possible to import the actual module and run the function.

?%run shows the command points to IPython/core/magics/execution.py and run is a method of the class ExecutionMagics in the module execution.

So perhaps, I could use execution.ExecutionMagic.run() if I created an instance of the class.

But it's beyond me - tricky and I'm doubting it's an effective solution.

How can this be done?

Am I really stuck with:-

%run ./a notebook
 
%run ./another_notebook
 
%run ./yet_another_hardcoded_notebook_name

Eternally grateful for any help!

4 REPLIES 4

UmaMahesh1
Honored Contributor III

Hi @Philip Blakeman​ 

You can do this using scala or python constructs using threads and futures.

You can download the notebook archive from this link.

https://docs.databricks.com/notebooks/notebook-workflows.html#run-multiple-notebooks-concurrently

After that, based on your preference, set the number of parallel notebooks to be run using numNotebooksInParallel variable. If you just want 1 notebook at a time, you can do that too just by removing the unnecessary parts.

Be careful not to crash your driver by providing too many parallel notebooks.

Hope this helps.. Cheers.

Uma Mahesh D

UmaMahesh1
Honored Contributor III

Hello @Philip Blakeman​ 

Checking if your issue got resolved or if you are facing any issues with the above approach..

Cheers..

Uma Mahesh D

Philblakeman
New Contributor III

Hi

Thanks for the answer but it wasn't what I was after. It was about being able to provide a list as a variable.

I'll run the notebooks in series.

Basically can you %run {mynotebook}?

I think the answer is no.

https://stackoverflow.com/questions/74518979/how-to-run-a-list-of-notebooks-in-databricks

Ajay-Pandey
Esteemed Contributor III

Please refer below code

import scala.concurrent.{Future, Await}
import scala.concurrent.duration._
import scala.util.control.NonFatal
 
case class NotebookData(path: String, timeout: Int, parameters: Map[String, String] = Map.empty[String, String])
 
def parallelNotebooks(notebooks: Seq[NotebookData]): Future[Seq[String]] = {
  import scala.concurrent.{Future, blocking, Await}
  import java.util.concurrent.Executors
  import scala.concurrent.ExecutionContext
  import com.databricks.WorkflowException
 
  val numNotebooksInParallel = 4 
  // If you create too many notebooks in parallel the driver may crash when you submit all of the jobs at once. 
  // This code limits the number of parallel notebooks.
  implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(numNotebooksInParallel))
  val ctx = dbutils.notebook.getContext()
  
  Future.sequence(
    notebooks.map { notebook => 
      Future {
        dbutils.notebook.setContext(ctx)
        if (notebook.parameters.nonEmpty)
          dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)
        else
          dbutils.notebook.run(notebook.path, notebook.timeout)
      }
      .recover {
        case NonFatal(e) => s"ERROR: ${e.getMessage}"
      }
    }
  )
}
 
def parallelNotebook(notebook: NotebookData): Future[String] = {
  import scala.concurrent.{Future, blocking, Await}
  import java.util.concurrent.Executors
  import scala.concurrent.ExecutionContext.Implicits.global
  import com.databricks.WorkflowException
 
  val ctx = dbutils.notebook.getContext()
  // The simplest interface we can have but doesn't
  // have protection for submitting to many notebooks in parallel at once
  Future {
    dbutils.notebook.setContext(ctx)
    
    if (notebook.parameters.nonEmpty)
      dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)
    else
      dbutils.notebook.run(notebook.path, notebook.timeout)
    
  }
  .recover {
    case NonFatal(e) => s"ERROR: ${e.getMessage}"
  }
}

Ajay Kumar Pandey

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group