cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Get task_run_id that is nested in a job_run task

ChristianRRL
Honored Contributor

Hi, I'm wondering if there is an easier way to accomplish this.

I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.

Currently I am considering using the databricks REST API to get the run_id of a notebook task (Child 1) that is nested inside a run_job task (Parent 1), that I can later reference in another run_job task downstream (Parent 2).

Would there be another/easier way of doing this?

1 ACCEPTED SOLUTION

Accepted Solutions

anuj_lathi
Databricks Employee
Databricks Employee

Hi โ€” good question. The cleanest way to do this is with task values, no REST API needed.

Approach: Task Values (Recommended)

In Child 1's notebook, capture its own run_id and set it as a task value:

import json

 

ctx = json.loads(

    dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()

)

child1_run_id = ctx["currentRunId"]["id"]

 

dbutils.jobs.taskValues.set(key="child1_run_id", value=str(child1_run_id))

 

Then in your orchestrator job, when configuring Parent 2's job parameters, reference it with:

{{tasks.Parent1.values.child1_run_id}}

 

Task values set inside a child job are propagated back through the run_job task, so the orchestrator can access them via {{tasks.<run_job_task_name>.values.<key>}}.

Why not {{tasks.Parent1.run_id}}?

As you noticed, {{tasks.Parent1.run_id}} gives you the orchestrator's task run_id for the runjob task itself โ€” not the child job's internal task runid. That's why task values are the right tool here: they let the child task explicitly publish its own metadata for upstream consumption.

REST API Fallback

If you can't modify Child 1's notebook, then yes, the REST API approach works:

  1. Pass {{tasks.Parent1.run_id}} into an intermediate notebook task
  2. Use the Runs Get API to fetch the triggered child job's run details and extract Child 1's task run_id from the tasks array

But if you can add a couple of lines to Child 1, the task values approach is simpler and avoids API calls entirely.

Docs:

Hope that helps!

Anuj Lathi
Solutions Engineer @ Databricks

View solution in original post

1 REPLY 1

anuj_lathi
Databricks Employee
Databricks Employee

Hi โ€” good question. The cleanest way to do this is with task values, no REST API needed.

Approach: Task Values (Recommended)

In Child 1's notebook, capture its own run_id and set it as a task value:

import json

 

ctx = json.loads(

    dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()

)

child1_run_id = ctx["currentRunId"]["id"]

 

dbutils.jobs.taskValues.set(key="child1_run_id", value=str(child1_run_id))

 

Then in your orchestrator job, when configuring Parent 2's job parameters, reference it with:

{{tasks.Parent1.values.child1_run_id}}

 

Task values set inside a child job are propagated back through the run_job task, so the orchestrator can access them via {{tasks.<run_job_task_name>.values.<key>}}.

Why not {{tasks.Parent1.run_id}}?

As you noticed, {{tasks.Parent1.run_id}} gives you the orchestrator's task run_id for the runjob task itself โ€” not the child job's internal task runid. That's why task values are the right tool here: they let the child task explicitly publish its own metadata for upstream consumption.

REST API Fallback

If you can't modify Child 1's notebook, then yes, the REST API approach works:

  1. Pass {{tasks.Parent1.run_id}} into an intermediate notebook task
  2. Use the Runs Get API to fetch the triggered child job's run details and extract Child 1's task run_id from the tasks array

But if you can add a couple of lines to Child 1, the task values approach is simpler and avoids API calls entirely.

Docs:

Hope that helps!

Anuj Lathi
Solutions Engineer @ Databricks