Databricks Community

Yuppp · ‎02-19-2025

Hi everyone,

I have a workflow involving two notebooks: Notebook A and Notebook B. At the end of Notebook A, we generate a variable number of files, let's call it N. I want to run Notebook B for each of these N files.

I know Databricks has a Foreach task that can iterate over a list of items.

Here's what I've tried so far

output_dir_paths = [<list of paths>]

dbutils.jobs.taskvalues.set(key="notebook_A_output_paths", value=output_dir_paths)

ForEach Loop:

For Each.jpg

The Task:

In Notebook B, I'm attempting to read each path like this:

path = dbutils.widgets.get("single_batch_file")

Could someone please help me correct the code to pass the list of paths from Notebook A, iterate over each path, and send it to Notebook B?

mark_ott · a week ago

You can use Databricks Workflows' foreach task to handle running Notebook B for each file generated in Notebook A. The key is to pass each path as a parameter to Notebook B using Databricks task values and workflows features, not widgets set manually. Here’s how you can structure this workflow step by step:

1. Notebook A: Produce and Pass Output

After you create your output paths (a Python list of strings), set them as a task value:

python

output_dir_paths = [...]  # List of paths generated in Notebook A
dbutils.jobs.taskValues.set(key="notebook_A_output_paths", value=output_dir_paths)

This persists the list for use in the job, not as widgets.

2. Workflow Configuration: Foreach (in Databricks Job UI)

Task A: Notebook A runs as the first step.
Task B: Downstream task set up as a “foreach” loop.
- In the "items" field, reference the output from Notebook A:
  
  text
  
  {{tasks.taskA.taskValues.notebook_A_output_paths}}
- Each iteration will pick one item (path) from this list and pass it as a parameter to Notebook B.

Set up an input parameter in Notebook B, for example named single_batch_file.

3. Notebook B: Receive and Use the Parameter

In Notebook B, you should register a widget with the same name as the parameter in the workflow:

python

dbutils.widgets.text("single_batch_file", "")
path = dbutils.widgets.get("single_batch_file")
print("Processing", path)

This retrieves the path for each parallel run from the foreach loop.

4. How the Data Flows

Notebook A emits the list via TaskValues.
Databricks Job picks up the list and the foreach splits it into N parallel runs, each with a different path.
Notebook B receives single_batch_file as a widget (from the job parameter) and processes accordingly.

Key Points

Don’t try to manually set widget values in A for use in B; use job parameters and TaskValues instead.
Always declare the widget in B using dbutils.widgets.text(...) so jobs can inject the parameter.
The naming (single_batch_file) in the workflow must match the widget name in the notebook.

References

Official Databricks documentation and best practices.

Summary Table

Step	Action
Notebook A	Write list of output paths; set using `dbutils.jobs.taskValues.set`
Workflow (UI)	Set foreach loop; items = `{{tasks.taskA.taskValues.notebook_A_output_paths}}`
Notebook B	Register widget `single_batch_file` and read it