cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to pass parameters to a "Job as Task" from code?

JensH
New Contributor III

Hi,

I would like to use the new "Job as Task" feature but Im having trouble to pass values.

Scenario
I have a workflow job which contains 2 tasks.

  1. Task_A (type "Notebook"): Read data from a table and based on the contents decide, whether the workflow in Task_B should be executed (or not).
  2. Task_B (type "Run Job"): This references another Workflow that itself consists of multiple tasks, all of type "Notebook". The workflow takes several parameters. For the sake of brevity here, let's just assume a parameter "entity_ids".

The referenced workflow from Task_B runs totally fine on it's own when started manually.
For this scenario I would like to add some logic to automate execution of that workflow.

Based on what Task_A decides, I would like to start Task_B with the appropriate list of "entity_id" parameters values.

I had tried to set this with dbutils.jobs.taskValues.set("entity_id", "[1, 2]") in Task_A and read with dbutils.jobs.taskValues.get("Task_A", "entity_ids", debugValue="[]" ) in the first Notebook of Task_B, but this throws an error within the nested job:
Task key does not exist in run: Task_A.

I guess that the workflow that is referenced in Task_B is unaware of the parent workflow and might be run in a different context, and therefore cannot find the taskKey == "Task_A".

To verify my assumption, I tried changing the type of Task_B into a Notebook. Reading the value now works fine.

My question:
How do I pass (multiple) values into the job that is referenced in Task_B?

Thank you for your help!

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

To pass multiple values into the job that is referenced in Task_B, you can use dynamic value references. Dynamic value references allow you to reference task values set in upstream tasks. This is a recommended approach by Databricks as it can be used with multiple task types.
In your case, you can set the values in Task_A using dbutils.jobs.taskValues.set(). For example, if you want to set multiple entity_ids, you can do:

python
dbutils.jobs.taskValues.set(key = "entity_ids", value = [1, 2])

Then, in Task_B, you can reference these values using dynamic value references. The syntax for this is {{tasks.Task_A.values.entity_ids}}.

Please note that the dynamic value references are used in the job settings, not in the notebook code.This is how you would set it in the job settings:

json
"parameters": {
   "entity_ids": "{{tasks.Task_A.values.entity_ids}}"
}

This way, the entity_ids set in Task_A will be passed to Task_B.

View solution in original post

3 REPLIES 3

Walter_C
Databricks Employee
Databricks Employee

To pass multiple values into the job that is referenced in Task_B, you can use dynamic value references. Dynamic value references allow you to reference task values set in upstream tasks. This is a recommended approach by Databricks as it can be used with multiple task types.
In your case, you can set the values in Task_A using dbutils.jobs.taskValues.set(). For example, if you want to set multiple entity_ids, you can do:

python
dbutils.jobs.taskValues.set(key = "entity_ids", value = [1, 2])

Then, in Task_B, you can reference these values using dynamic value references. The syntax for this is {{tasks.Task_A.values.entity_ids}}.

Please note that the dynamic value references are used in the job settings, not in the notebook code.This is how you would set it in the job settings:

json
"parameters": {
   "entity_ids": "{{tasks.Task_A.values.entity_ids}}"
}

This way, the entity_ids set in Task_A will be passed to Task_B.

JensH
New Contributor III

Thank you @Walter_C , this helped me a lot and seems to work.

However, I cannot seem find documentation about possible limitations about what I can put into the task values (types of data or how large data might be in volume). Do you know more?

Walter_C
Databricks Employee
Databricks Employee

I found the following information:

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group