How to %run a list of notebooks in Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2022 05:01 AM
I'd like to %run a list of notebooks from another Databricks notebook.
my_notebooks = ["./setup", "./do_the_main_thing", "./check_results"]
for notebook in my_notebooks:
%run notebook
This doesn't work ofcourse. I don't want to use dbutils.notebook.run() as this creates new jobs and doesn't return anything back - I want everything executable and queryable from the main notebook.
I thought perhaps it might be possible to import the actual module and run the function.
?%run shows the command points to IPython/core/magics/execution.py and run is a method of the class ExecutionMagics in the module execution.
So perhaps, I could use execution.ExecutionMagic.run() if I created an instance of the class.
But it's beyond me - tricky and I'm doubting it's an effective solution.
How can this be done?
Am I really stuck with:-
%run ./a notebook
%run ./another_notebook
%run ./yet_another_hardcoded_notebook_name
Eternally grateful for any help!
- Labels:
-
Databricks notebook
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2022 02:08 AM
Hi @Philip Blakeman
You can do this using scala or python constructs using threads and futures.
You can download the notebook archive from this link.
https://docs.databricks.com/notebooks/notebook-workflows.html#run-multiple-notebooks-concurrently
After that, based on your preference, set the number of parallel notebooks to be run using numNotebooksInParallel variable. If you just want 1 notebook at a time, you can do that too just by removing the unnecessary parts.
Be careful not to crash your driver by providing too many parallel notebooks.
Hope this helps.. Cheers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2022 12:59 AM
Hello @Philip Blakeman
Checking if your issue got resolved or if you are facing any issues with the above approach..
Cheers..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2022 01:07 AM
Hi
Thanks for the answer but it wasn't what I was after. It was about being able to provide a list as a variable.
I'll run the notebooks in series.
Basically can you %run {mynotebook}?
I think the answer is no.
https://stackoverflow.com/questions/74518979/how-to-run-a-list-of-notebooks-in-databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-07-2022 03:09 AM
Please refer below code
import scala.concurrent.{Future, Await}
import scala.concurrent.duration._
import scala.util.control.NonFatal
case class NotebookData(path: String, timeout: Int, parameters: Map[String, String] = Map.empty[String, String])
def parallelNotebooks(notebooks: Seq[NotebookData]): Future[Seq[String]] = {
import scala.concurrent.{Future, blocking, Await}
import java.util.concurrent.Executors
import scala.concurrent.ExecutionContext
import com.databricks.WorkflowException
val numNotebooksInParallel = 4
// If you create too many notebooks in parallel the driver may crash when you submit all of the jobs at once.
// This code limits the number of parallel notebooks.
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(numNotebooksInParallel))
val ctx = dbutils.notebook.getContext()
Future.sequence(
notebooks.map { notebook =>
Future {
dbutils.notebook.setContext(ctx)
if (notebook.parameters.nonEmpty)
dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)
else
dbutils.notebook.run(notebook.path, notebook.timeout)
}
.recover {
case NonFatal(e) => s"ERROR: ${e.getMessage}"
}
}
)
}
def parallelNotebook(notebook: NotebookData): Future[String] = {
import scala.concurrent.{Future, blocking, Await}
import java.util.concurrent.Executors
import scala.concurrent.ExecutionContext.Implicits.global
import com.databricks.WorkflowException
val ctx = dbutils.notebook.getContext()
// The simplest interface we can have but doesn't
// have protection for submitting to many notebooks in parallel at once
Future {
dbutils.notebook.setContext(ctx)
if (notebook.parameters.nonEmpty)
dbutils.notebook.run(notebook.path, notebook.timeout, notebook.parameters)
else
dbutils.notebook.run(notebook.path, notebook.timeout)
}
.recover {
case NonFatal(e) => s"ERROR: ${e.getMessage}"
}
}

