Re: How to pass parameters to a "Job as Task" from...

JensH · ‎01-11-2024

Hi,

I would like to use the new "Job as Task" feature but Im having trouble to pass values.

Scenario
I have a workflow job which contains 2 tasks.

Task_A (type "Notebook"): Read data from a table and based on the contents decide, whether the workflow in Task_B should be executed (or not).
Task_B (type "Run Job"): This references another Workflow that itself consists of multiple tasks, all of type "Notebook". The workflow takes several parameters. For the sake of brevity here, let's just assume a parameter "entity_ids".

The referenced workflow from Task_B runs totally fine on it's own when started manually.
For this scenario I would like to add some logic to automate execution of that workflow.

Based on what Task_A decides, I would like to start Task_B with the appropriate list of "entity_id" parameters values.

I had tried to set this with dbutils.jobs.taskValues.set("entity_id", "[1, 2]") in Task_A and read with dbutils.jobs.taskValues.get("Task_A", "entity_ids", debugValue="[]" ) in the first Notebook of Task_B, but this throws an error within the nested job:
Task key does not exist in run: Task_A.

I guess that the workflow that is referenced in Task_B is unaware of the parent workflow and might be run in a different context, and therefore cannot find the taskKey == "Task_A".

To verify my assumption, I tried changing the type of Task_B into a Notebook. Reading the value now works fine.

My question:
How do I pass (multiple) values into the job that is referenced in Task_B?

Thank you for your help!

Walter_C · ‎01-20-2024

To pass multiple values into the job that is referenced in Task_B, you can use dynamic value references. Dynamic value references allow you to reference task values set in upstream tasks. This is a recommended approach by Databricks as it can be used with multiple task types.
In your case, you can set the values in Task_A using dbutils.jobs.taskValues.set(). For example, if you want to set multiple entity_ids, you can do:

python
dbutils.jobs.taskValues.set(key = "entity_ids", value = [1, 2])

Then, in Task_B, you can reference these values using dynamic value references. The syntax for this is {{tasks.Task_A.values.entity_ids}}.

Please note that the dynamic value references are used in the job settings, not in the notebook code.This is how you would set it in the job settings:

json
"parameters": {
    "entity_ids": "{{tasks.Task_A.values.entity_ids}}"
}

This way, the entity_ids set in Task_A will be passed to Task_B.

View solution in original post

JensH · ‎01-21-2024

Thank you @Walter_C , this helped me a lot and seems to work.

However, I cannot seem find documentation about possible limitations about what I can put into the task values (types of data or how large data might be in volume). Do you know more?

Walter_C · ‎01-27-2024

I found the following information:

value is the value for this task value’s key. This command must be able to represent the value internally in JSON format. The size of the JSON representation of the value cannot exceed 48 KiB.

You can refer to https://docs.databricks.com/en/workflows/jobs/share-task-context.html

How to pass parameters to a "Job as Task" from code?