Need help with setting up ForEach task in Databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-19-2025 06:35 AM - edited 02-19-2025 06:45 AM
Hi everyone,
I have a workflow involving two notebooks: Notebook A and Notebook B. At the end of Notebook A, we generate a variable number of files, let's call it N. I want to run Notebook B for each of these N files.
I know Databricks has a Foreach task that can iterate over a list of items.
Here's what I've tried so far
output_dir_paths = [<list of paths>]
dbutils.jobs.taskvalues.set(key="notebook_A_output_paths", value=output_dir_paths)ForEach Loop:
The Task:
In Notebook B, I'm attempting to read each path like this:
path = dbutils.widgets.get("single_batch_file")
Could someone please help me correct the code to pass the list of paths from Notebook A, iterate over each path, and send it to Notebook B?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2025 08:26 AM
You can use Databricks Workflows' foreach task to handle running Notebook B for each file generated in Notebook A. The key is to pass each path as a parameter to Notebook B using Databricks task values and workflows features, not widgets set manually. Here’s how you can structure this workflow step by step:
1. Notebook A: Produce and Pass Output
After you create your output paths (a Python list of strings), set them as a task value:
output_dir_paths = [...] # List of paths generated in Notebook A
dbutils.jobs.taskValues.set(key="notebook_A_output_paths", value=output_dir_paths)
This persists the list for use in the job, not as widgets.
2. Workflow Configuration: Foreach (in Databricks Job UI)
-
Task A: Notebook A runs as the first step.
-
Task B: Downstream task set up as a “foreach” loop.
-
In the "items" field, reference the output from Notebook A:
text{{tasks.taskA.taskValues.notebook_A_output_paths}} -
Each iteration will pick one item (path) from this list and pass it as a parameter to Notebook B.
-
Set up an input parameter in Notebook B, for example named single_batch_file.
3. Notebook B: Receive and Use the Parameter
In Notebook B, you should register a widget with the same name as the parameter in the workflow:
dbutils.widgets.text("single_batch_file", "")
path = dbutils.widgets.get("single_batch_file")
print("Processing", path)
This retrieves the path for each parallel run from the foreach loop.
4. How the Data Flows
-
Notebook A emits the list via TaskValues.
-
Databricks Job picks up the list and the foreach splits it into N parallel runs, each with a different path.
-
Notebook B receives
single_batch_fileas a widget (from the job parameter) and processes accordingly.
Key Points
-
Don’t try to manually set widget values in A for use in B; use job parameters and TaskValues instead.
-
Always declare the widget in B using
dbutils.widgets.text(...)so jobs can inject the parameter. -
The naming (
single_batch_file) in the workflow must match the widget name in the notebook.
References
-
Official Databricks documentation and best practices.
Summary Table
| Step | Action |
|---|---|
| Notebook A | Write list of output paths; set using dbutils.jobs.taskValues.set |
| Workflow (UI) | Set foreach loop; items = {{tasks.taskA.taskValues.notebook_A_output_paths}} |
| Notebook B | Register widget single_batch_file and read it |
This setup is scalable, robust, and leverages built-in Databricks Workflow best practices for passing dynamic file lists between notebook tasks.