cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

system.lakeflow.job_task_run_timeline table missing task parameters on for each loop input

hgintexas
New Contributor II

The system.lakeflow.job_task_run_timeline does not include the task level parameters on the input of the for each loop if dynamically setting the parameter in another notebook using dbutils.jobs.taskValues.set. This information is not included in the API (https://docs.databricks.com/api/workspace/jobs/getrunoutput) either. Where can I find this information to build a job execution log to monitor jobs overtime?

3 REPLIES 3

stbjelcevic
Databricks Employee
Databricks Employee

Hi @hgintexas ,

Youโ€™re right, the system job timeline tables and the Runs API donโ€™t currently surface the resolved perโ€‘iteration inputs for a For-each task when those inputs are sourced via task values set in another notebook with dbutils.jobs.taskValues.set().

The only place Databricks explicitly documents showing task values for a for-each run is the Task run details UI โ€œOutputโ€ panel, which isnโ€™t exposed by the Jobs Get Run Output endpoint. There is no separate API that exposes the same rendered output for aggregation across runs.

A potential workaround: in the upstream task (the one calling dbutils.jobs.taskValues.set), also write the key/value you set to a small Delta table you control, along with identifiers you can later join on (job_id, job_run_id, task_key, and any iteration id or logical key you use) with system.lakeflow.job_task_run_timeline or with system.lakeflow.job_run_timeline. While it isn't the simplest solution, I think it would achieve what you are looking for.

We use for-each loops with dynamic variables across many of our jobs, so addressing this would be a significant effort. Since a new task_id is assigned for each for-each iteration, if one of the iterations fails itโ€™s difficult to determine which iteration value caused the failure using the proposed method. Would it be possible to request a feature enhancement to expose for-each loop task values in the system.lakeflow.job_task_run_timeline table or via the Runs API?

I just submitted this as an idea into our internal product-ideas portal because I agree with you that it would be a good enhancement to Databricks!

However, I can't guarantee a timeline or that our product team will prioritize it in the near future.