You can use Databricks Workflows' foreach task to handle running Notebook B for each file generated in Notebook A. The key is to pass each path as a parameter to Notebook B using Databricks task values and workflows features, not widgets set manually. Here’s how you can structure this workflow step by step:
1. Notebook A: Produce and Pass Output
After you create your output paths (a Python list of strings), set them as a task value:
output_dir_paths = [...] # List of paths generated in Notebook A
dbutils.jobs.taskValues.set(key="notebook_A_output_paths", value=output_dir_paths)
This persists the list for use in the job, not as widgets.
2. Workflow Configuration: Foreach (in Databricks Job UI)
-
Task A: Notebook A runs as the first step.
-
Task B: Downstream task set up as a “foreach” loop.
-
In the "items" field, reference the output from Notebook A:
{{tasks.taskA.taskValues.notebook_A_output_paths}}
-
Each iteration will pick one item (path) from this list and pass it as a parameter to Notebook B.
Set up an input parameter in Notebook B, for example named single_batch_file.
3. Notebook B: Receive and Use the Parameter
In Notebook B, you should register a widget with the same name as the parameter in the workflow:
dbutils.widgets.text("single_batch_file", "")
path = dbutils.widgets.get("single_batch_file")
print("Processing", path)
This retrieves the path for each parallel run from the foreach loop.
4. How the Data Flows
-
Notebook A emits the list via TaskValues.
-
Databricks Job picks up the list and the foreach splits it into N parallel runs, each with a different path.
-
Notebook B receives single_batch_file as a widget (from the job parameter) and processes accordingly.
Key Points
-
Don’t try to manually set widget values in A for use in B; use job parameters and TaskValues instead.
-
Always declare the widget in B using dbutils.widgets.text(...) so jobs can inject the parameter.
-
The naming (single_batch_file) in the workflow must match the widget name in the notebook.
References
-
Official Databricks documentation and best practices.
Summary Table
| Step |
Action |
| Notebook A |
Write list of output paths; set using dbutils.jobs.taskValues.set |
| Workflow (UI) |
Set foreach loop; items = {{tasks.taskA.taskValues.notebook_A_output_paths}} |
| Notebook B |
Register widget single_batch_file and read it |
This setup is scalable, robust, and leverages built-in Databricks Workflow best practices for passing dynamic file lists between notebook tasks.