2 weeks ago
Hi there, I'm trying to reference a task value - let's call it `output_path` (not known until programmatically generated by the code) - that is created in a nested task (Child 1) within a run_job (Parent 1) as an input parameter - let's call it `input_path` - for a downstream run_job (Parent 2). I understand that due to the way variable scoping works, this may not be typically possible and am looking into some possible ways to do this.
Some approaches I'm considering currently:
Please let me know if there are other/better approaches I may not be considering, or else if one of the above options is generally more or less recommended.
NOTE: Trying to paste an image, but lately the paste functionality has not been working. Attached a reference image as well in case the image paste didn't go through
2 weeks ago
Hi @ChristianRRL,
No. As of now, Lakeflow Jobs doesn’t provide global, mutable variables that you can set from any task and read from any other task, regardless of scope. This is a current limitation of the platform...
I think you’ve already explored the supported patterns (job parameters, task values, etc.). I'm assuming you have a reason to keep the computation inside a separate child job. If so, the most robust option is to persist output_path to an external store (for example, a Delta table or a Unity Catalog volume / external location) in the child job. In the parent job, add a notebook task that reads that value and re-exposes it via dbutils.jobs.taskValues.set, and then reference it in downstream tasks using a dynamic value reference like {{tasks.<task_name>.values.output_path}}.
Using GET /api/2.1/jobs/runs/get-output doesn’t give you a global variable either. It’s read-only in the sense you can’t set a variable in Lakeflow Jobs with it. It works best for an external orchestrator pattern (external code runs Parent 1, calls get-output, then starts Parent 2 with that value as a job parameter).
Avoid using workspace DBFS for this kind of cross-job state. Prefer Unity Catalog managed storage instead.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
2 weeks ago
Hi @ChristianRRL,
No. Lakeflow Jobs don’t support a child job/task setting or updating a parent job’s task values.
dbutils.jobs.taskValues.set() always writes a value for the current task in the current job run. There is no way to target a different task or a different job (like the Run Job parent).
Run Job creates a separate job run. Its task values remain scoped to that child job and cannot become the task values of the parent’s Run Job task, nor can they be read by a sibling Run Job (your Parent 2).
To get your pattern working, you still need to either move Child 1 into the same Lakeflow job as Parent 2 and use task values normally, or have Child 1 persist output_path to UC-managed storage, then in the parent job read it and re-expose it via dbutils.jobs.taskValues.set, which Parent 2 and its children can then reference.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
2 weeks ago
Quick update, my question effectively boils down to:
Do databricks workflows have "global" variables that can be set programmatically from anywhere in the workflow (e.g. nested notebook task inside a parent run_job task) during runtime and be referenced anywhere else in the workflow, regardless of scope?
Consulting with LLMs, I have some partial answers but still would appreciate some feedback from the community!
Updates on my considered approaches:
2 weeks ago
Hi @ChristianRRL,
No. As of now, Lakeflow Jobs doesn’t provide global, mutable variables that you can set from any task and read from any other task, regardless of scope. This is a current limitation of the platform...
I think you’ve already explored the supported patterns (job parameters, task values, etc.). I'm assuming you have a reason to keep the computation inside a separate child job. If so, the most robust option is to persist output_path to an external store (for example, a Delta table or a Unity Catalog volume / external location) in the child job. In the parent job, add a notebook task that reads that value and re-exposes it via dbutils.jobs.taskValues.set, and then reference it in downstream tasks using a dynamic value reference like {{tasks.<task_name>.values.output_path}}.
Using GET /api/2.1/jobs/runs/get-output doesn’t give you a global variable either. It’s read-only in the sense you can’t set a variable in Lakeflow Jobs with it. It works best for an external orchestrator pattern (external code runs Parent 1, calls get-output, then starts Parent 2 with that value as a job parameter).
Avoid using workspace DBFS for this kind of cross-job state. Prefer Unity Catalog managed storage instead.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.
2 weeks ago
I think this makes a lot of sense.
One follow-up question, do Lakeflow Jobs in some way support the ability for a Child task to create or update a parent task value? An example in the context I shared earlier:
Let me know if this makes sense and if it's even possible?
2 weeks ago
Hi @ChristianRRL,
No. Lakeflow Jobs don’t support a child job/task setting or updating a parent job’s task values.
dbutils.jobs.taskValues.set() always writes a value for the current task in the current job run. There is no way to target a different task or a different job (like the Run Job parent).
Run Job creates a separate job run. Its task values remain scoped to that child job and cannot become the task values of the parent’s Run Job task, nor can they be read by a sibling Run Job (your Parent 2).
To get your pattern working, you still need to either move Child 1 into the same Lakeflow job as Parent 2 and use task values normally, or have Child 1 persist output_path to UC-managed storage, then in the parent job read it and re-expose it via dbutils.jobs.taskValues.set, which Parent 2 and its children can then reference.
If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.