cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Get task_run_id (or job_run_id) of a *launched* job_run task

ChristianRRL
Honored Contributor

Hi there, I'm finding this a bit trickier than originally expected and am hoping someone can help me understand if I'm missing something.

I have 3 jobs:

  • One orchestrator job (tasks are type run_job)
  • Two "Parent" jobs (tasks are type notebook)
    • parent1 runs the task child1
    • parent2 runs the task child2

I need to get the task_run_id of the *launched* Parent1 job's Child1. Originally, I was exploring using Dynamic Value Reference in order to feed the job parameter

parent1_run_id = {{tasks.parent1.run_id}}

I was thinking that with this run_id I could use Databricks REST API in order to find the child1 run_id. However, what I'm seeing is that because

{{tasks.parent1.run_id}}

corresponds to the orchestrator parent1 task, and this is different than the actual *launched* parent1 run_id, this is a dead end, and I cannot use REST API to pull the child1 run_id.

Please let me know if I'm missing anything here. Attached some images for reference.

task_run_id-poc-1.png

task_run_id-poc-2.png

task_run_id-poc-3.png

1 ACCEPTED SOLUTION

Accepted Solutions

emma_s
Databricks Employee
Databricks Employee

Hi, I ran into the same confusion and did some testing on this. Here's what I found:

Task values don't cross the run_job boundary. So even if child1 sets a task value with dbutils.jobs.taskValues.set(), the orchestrator can't read it.

But {{tasks.parent1.run_id}} is actually still useful — you just need one extra API call.

If you call GET /api/2.1/jobs/runs/get-output with that run_id, the response includes a run_job_output field that has the actual launched job's run_id. From there you can call GET /api/2.1/jobs/runs/get on
that to find child1's task_run_id.

So the trick is:

1. Pass {{tasks.parent1.run_id}} to a downstream notebook via base_parameters
2. In that notebook, call get-output with that ID → gives you run_job_output.run_id (the real parent1 run)
3. Call get-run on that → find child1 in the tasks list → grab its run_id

I hope this helps and makes sense I've tested this on serverless.

View solution in original post

2 REPLIES 2

emma_s
Databricks Employee
Databricks Employee

Hi, I ran into the same confusion and did some testing on this. Here's what I found:

Task values don't cross the run_job boundary. So even if child1 sets a task value with dbutils.jobs.taskValues.set(), the orchestrator can't read it.

But {{tasks.parent1.run_id}} is actually still useful — you just need one extra API call.

If you call GET /api/2.1/jobs/runs/get-output with that run_id, the response includes a run_job_output field that has the actual launched job's run_id. From there you can call GET /api/2.1/jobs/runs/get on
that to find child1's task_run_id.

So the trick is:

1. Pass {{tasks.parent1.run_id}} to a downstream notebook via base_parameters
2. In that notebook, call get-output with that ID → gives you run_job_output.run_id (the real parent1 run)
3. Call get-run on that → find child1 in the tasks list → grab its run_id

I hope this helps and makes sense I've tested this on serverless.

ChristianRRL
Honored Contributor

Thank you @emma_s ,this perfectly sums it up!