cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Get task_run_id that is nested in a job_run task

ChristianRRL
Honored Contributor

Hi, I'm wondering if there is an easier way to accomplish this.

I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.

Currently I am considering using the databricks REST API to get the run_id of a notebook task (Child 1) that is nested inside a run_job task (Parent 1), that I can later reference in another run_job task downstream (Parent 2).

Would there be another/easier way of doing this?

1 ACCEPTED SOLUTION

Accepted Solutions

Hi, I would refer to the following cross-post for the solution. 

As @emma_s points out, it basically boils down to:

1. Pass {{tasks.parent1.run_id}} to a downstream notebook via base_parameters
2. In that notebook, call get-output with that ID → gives you run_job_output.run_id (the real parent1 run)
3. Call get-run on that → find child1 in the tasks list → grab its run_id

Basically, the part I was missing was getting the `run_job_output.run_id` with which to programmatically get the child1 run_id.

View solution in original post

6 REPLIES 6

anuj_lathi
Databricks Employee
Databricks Employee

Hi — good question. The cleanest way to do this is with task values, no REST API needed.

Approach: Task Values (Recommended)

In Child 1's notebook, capture its own run_id and set it as a task value:

import json

 

ctx = json.loads(

    dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()

)

child1_run_id = ctx["currentRunId"]["id"]

 

dbutils.jobs.taskValues.set(key="child1_run_id", value=str(child1_run_id))

 

Then in your orchestrator job, when configuring Parent 2's job parameters, reference it with:

{{tasks.Parent1.values.child1_run_id}}

 

Task values set inside a child job are propagated back through the run_job task, so the orchestrator can access them via {{tasks.<run_job_task_name>.values.<key>}}.

Why not {{tasks.Parent1.run_id}}?

As you noticed, {{tasks.Parent1.run_id}} gives you the orchestrator's task run_id for the runjob task itself — not the child job's internal task runid. That's why task values are the right tool here: they let the child task explicitly publish its own metadata for upstream consumption.

REST API Fallback

If you can't modify Child 1's notebook, then yes, the REST API approach works:

  1. Pass {{tasks.Parent1.run_id}} into an intermediate notebook task
  2. Use the Runs Get API to fetch the triggered child job's run details and extract Child 1's task run_id from the tasks array

But if you can add a couple of lines to Child 1, the task values approach is simpler and avoids API calls entirely.

Docs:

Hope that helps!

Anuj Lathi
Solutions Engineer @ Databricks

I'm sorry, but there's a couple of things I need to call out in your response @anuj_lathi .

  1. The response was marked as "Accepted Solution", but I did not personally review and accept this as a solution. So I have marked as "Not the Solution".
  2. The response "seems" to be AI generated, which nowadays is not inherently a bad thing so long as it is verified before posting.
  3. The "recommended" approach suggested simply does not work. This functionality does not exist in Databricks as described (which is my original suspicion)
    • I *cannot* reference the child1 task run_id outside of parent1 by parent2
      • {{tasks.parent1.values.child1_run_id}}
    • I *can* reference the parent1 task run_id outside of parent1 by parent2
      • {{tasks.parent1.run_id}}
      • This is not what I'm looking for necessarily, but it seems like the only thing that is exposed via existing Job Workflow functionality

Attached images for reference.

anuj_lathi
Databricks Employee
Databricks Employee

Hi @ChristianRRL  you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks.

Your instinct about the REST API was correct. Here's the fix:

Solution: Add an intermediate notebook task in the orchestrator

Orchestrator:

  ├── Parent1 (run_job)

  ├── get_child_run_id (notebook task) ← NEW, depends on Parent1

  └── Parent2 (run_job, depends on get_child_run_id)

 

Notebook (`get_child_run_id`):

import requests

 

host = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().get()

token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

headers = {"Authorization": f"Bearer {token}"}

 

# Get orchestrator run → find Parent1 → get child job run_id → find Child1

job_run_id = spark.conf.get("spark.databricks.job.runId")

 

orch_run = requests.get(f"{host}/api/2.1/jobs/runs/get",

    headers=headers, params={"run_id": job_run_id}).json()

parent1 = next(t for t in orch_run["tasks"] if t["task_key"] == "Parent1")

 

child_run = requests.get(f"{host}/api/2.1/jobs/runs/get-output",

    headers=headers, params={"run_id": parent1["run_id"]}).json()

child_job_run_id = child_run["metadata"]["run_id"]

 

child_job = requests.get(f"{host}/api/2.1/jobs/runs/get",

    headers=headers, params={"run_id": child_job_run_id}).json()

child1 = next(t for t in child_job["tasks"] if t["task_key"] == "Child1")

 

dbutils.jobs.taskValues.set(key="child1_run_id", value=str(child1["run_id"]))

 

Then in Parent 2, reference: {{tasks.get_child_run_id.values.child1_run_id}}

Can you check if this works?

Apologies again for the earlier response.

 

Regards,
Anuj

Anuj Lathi
Solutions Engineer @ Databricks

Ok, I think this makes much more sense and I see how it could work.

The only thing I would say as far as how we're trying to implement this, rather than having the intermediary `get_child_run_id` notebook task, I am trying to get the `child_run_id` inside of the parent2 run_job rather than it being passed via the intermediary step. We can still accomplish this by including the dynamic value reference `{{tasks.parent1.run_id}}` as a job parameter for parent2. This way, once the parent2 run_job has the parent1 run_id, we can follow similar steps as what you outlined.

Thank you for your assistance! This helps confirm my solution path.

Sorry, I have to mark this as not solution again. But I think the issue is becoming clearer. Please see my attached images.

Basically, the issue I'm having is that it seems like I can only get the "Child" task run_id for the task at the orchestrator level (e.g. run_fleet_wtg_ge_silver).. however, this run_id is different than the actual nested run_id of the launched job and its respective task. Because a run_job task *launches* a separate instance of that job, I am not able to get the nested job > task run_id I need.

Put another way, what I have is:

  • Orchestrator Parent1 > run_job task >>> *launches* >>> Parent1 instance (this is different than the original Parent1) > notebook task (this is the actual Child1 run_id I'm looking for)

Let me know if this makes sense. This is trickier than I was originally thinking. 

Hi, I would refer to the following cross-post for the solution. 

As @emma_s points out, it basically boils down to:

1. Pass {{tasks.parent1.run_id}} to a downstream notebook via base_parameters
2. In that notebook, call get-output with that ID → gives you run_job_output.run_id (the real parent1 run)
3. Call get-run on that → find child1 in the tasks list → grab its run_id

Basically, the part I was missing was getting the `run_job_output.run_id` with which to programmatically get the child1 run_id.