mark_ott
Databricks Employee
Databricks Employee

You can use Databricks Workflows' foreach task to handle running Notebook B for each file generated in Notebook A. The key is to pass each path as a parameter to Notebook B using Databricks task values and workflows features, not widgets set manually. Here’s how you can structure this workflow step by step:

1. Notebook A: Produce and Pass Output

After you create your output paths (a Python list of strings), set them as a task value:

python
output_dir_paths = [...] # List of paths generated in Notebook A dbutils.jobs.taskValues.set(key="notebook_A_output_paths", value=output_dir_paths)

This persists the list for use in the job, not as widgets.


2. Workflow Configuration: Foreach (in Databricks Job UI)

  • Task A: Notebook A runs as the first step.

  • Task B: Downstream task set up as a “foreach” loop.

    • In the "items" field, reference the output from Notebook A:

      text
      {{tasks.taskA.taskValues.notebook_A_output_paths}}
    • Each iteration will pick one item (path) from this list and pass it as a parameter to Notebook B.

Set up an input parameter in Notebook B, for example named single_batch_file.


3. Notebook B: Receive and Use the Parameter

In Notebook B, you should register a widget with the same name as the parameter in the workflow:

python
dbutils.widgets.text("single_batch_file", "") path = dbutils.widgets.get("single_batch_file") print("Processing", path)

This retrieves the path for each parallel run from the foreach loop.


4. How the Data Flows

  • Notebook A emits the list via TaskValues.

  • Databricks Job picks up the list and the foreach splits it into N parallel runs, each with a different path.

  • Notebook B receives single_batch_file as a widget (from the job parameter) and processes accordingly.


Key Points

  • Don’t try to manually set widget values in A for use in B; use job parameters and TaskValues instead.

  • Always declare the widget in B using dbutils.widgets.text(...) so jobs can inject the parameter.

  • The naming (single_batch_file) in the workflow must match the widget name in the notebook.


References

  • Official Databricks documentation and best practices.


Summary Table

Step Action
Notebook A Write list of output paths; set using dbutils.jobs.taskValues.set
Workflow (UI) Set foreach loop; items = {{tasks.taskA.taskValues.notebook_A_output_paths}}
Notebook B Register widget single_batch_file and read it
 
 

This setup is scalable, robust, and leverages built-in Databricks Workflow best practices for passing dynamic file lists between notebook tasks.