Walter_C
Databricks Employee
Databricks Employee

 

Dynamic Task Mapping

Databricks Workflows offers a similar concept to Airflow's dynamic task mapping through the "For each" task type


This allows you to run a task in a loop, passing different parameters to each iteration. Here's how you can replicate the functionality of Airflow's 
.expand() function:
  1. Create a "For each" task in your Databricks Workflow.
  2. Define the iterable items (similar to what you'd pass to .expand() in Airflow).
  3. Specify a nested task that will be executed for each item in the iterable.

For example, if you have a list of dates to process, you could set up a "For each" task that iterates over these dates and runs a notebook or Python wheel for each one.

Reference: https://docs.databricks.com/en/jobs/for-each.html 

In Databricks Workflows, there isn't a direct equivalent to Airflow's get_current_context() function. However, you can access similar information through different means:

  1. Job Parameters: You can define job-level parameters that are accessible to all tasks within the workflow.

  2. Task Values: Databricks Workflows supports "Task Values," which allow you to set and retrieve small values from tasks. This can be used to pass information between tasks in a workflow.

  3. Dynamic Values: Databricks Workflows supports dynamic value references, which allow you to access certain runtime information. For example:
    • {{job.run_id}} gives you the current job run ID
    • {{job.start_time}} provides the job start time

  4. Notebook Parameters: If you're using notebook tasks, you can pass parameters to the notebook, which can include runtime information.

    Reference: https://docs.databricks.com/en/jobs/job-parameters.html and https://docs.databricks.com/en/jobs/task-parameters.html