How to pass parameters to a "Job as Task" from code?

JensH
New Contributor III

Hi,

I would like to use the new "Job as Task" feature but Im having trouble to pass values.

Scenario
I have a workflow job which contains 2 tasks.

  1. Task_A (type "Notebook"): Read data from a table and based on the contents decide, whether the workflow in Task_B should be executed (or not).
  2. Task_B (type "Run Job"): This references another Workflow that itself consists of multiple tasks, all of type "Notebook". The workflow takes several parameters. For the sake of brevity here, let's just assume a parameter "entity_ids".

The referenced workflow from Task_B runs totally fine on it's own when started manually.
For this scenario I would like to add some logic to automate execution of that workflow.

Based on what Task_A decides, I would like to start Task_B with the appropriate list of "entity_id" parameters values.

I had tried to set this with dbutils.jobs.taskValues.set("entity_id", "[1, 2]") in Task_A and read with dbutils.jobs.taskValues.get("Task_A", "entity_ids", debugValue="[]" ) in the first Notebook of Task_B, but this throws an error within the nested job:
Task key does not exist in run: Task_A.

I guess that the workflow that is referenced in Task_B is unaware of the parent workflow and might be run in a different context, and therefore cannot find the taskKey == "Task_A".

To verify my assumption, I tried changing the type of Task_B into a Notebook. Reading the value now works fine.

My question:
How do I pass (multiple) values into the job that is referenced in Task_B?

Thank you for your help!

Walter_C
Databricks Employee
Databricks Employee

To pass multiple values into the job that is referenced in Task_B, you can use dynamic value references. Dynamic value references allow you to reference task values set in upstream tasks. This is a recommended approach by Databricks as it can be used with multiple task types.
In your case, you can set the values in Task_A using dbutils.jobs.taskValues.set(). For example, if you want to set multiple entity_ids, you can do:

python
dbutils.jobs.taskValues.set(key = "entity_ids", value = [1, 2])

Then, in Task_B, you can reference these values using dynamic value references. The syntax for this is {{tasks.Task_A.values.entity_ids}}.

Please note that the dynamic value references are used in the job settings, not in the notebook code.This is how you would set it in the job settings:

json
"parameters": {
   "entity_ids": "{{tasks.Task_A.values.entity_ids}}"
}

This way, the entity_ids set in Task_A will be passed to Task_B.

View solution in original post

JensH
New Contributor III

Thank you @Walter_C , this helped me a lot and seems to work.

However, I cannot seem find documentation about possible limitations about what I can put into the task values (types of data or how large data might be in volume). Do you know more?

Walter_C
Databricks Employee
Databricks Employee

I found the following information: