cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Keep history of task runs in Databricks Workflows while moving it from one job to another

jkb7
New Contributor III

We are using Databricks Asset Bundles (DAB) to orchestrate multiple workflow jobs, each containing multiple tasks.
The execution schedules is managed on the job level, i.e., all tasks within a job start together.
We often face the issue of rescheduling, i.e., we want to move an existing task, which already has a history of task executions / job runs. To reschedule a task, we might move it from one job to another. By moving the task from one job to another, we loose the tasks execution history. This is a problem for us, as we would like to understand the history of job executions as part of our logging / traceability of what happened in the past.

The question is: How is the task execution history associated with the task? Which properties of the task should we keep constant to keep the history associated with the task? There are task IDs, I guess? How can we make use of them?

Cheers
Julian

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

Thanks for the explanation, the tasks execution history is normally associated to the job id and job run that is being used, as you are moving your task from one job to another one there is no way to keep this execution history

 

View solution in original post

6 REPLIES 6

Walter_C
Databricks Employee
Databricks Employee

Hello Julian,

Is there any particular reason on why you cannot just change the schedule time of the job based on your needs instead of creating a new job?

jkb7
New Contributor III

Thank you for your response! 🙂

We have multiple hundreds of tasks (might become thousands) and we usually tend to group them to a smaller number of jobs. So one job aggregates multiple tasks.

We regularly find ourselves in the situation that a given task which currently is part of job A,
would be better off being part of job B.

However, we do not want to change the schedule of all the other tasks which are part of job A (the job in which our given task had originally been part of.)
We do not want to change the schedules of job A and job B, but just want to move the given task between the jobs. Or sometimes, if we want to schedule the given task at a new schedule which is not yet implemented in terms of an existing job, we might want to create a new job and move the given task to this new job.

jkb7
New Contributor III

Shorter: I don't want to change the schedule of the other tasks within the same job.

Walter_C
Databricks Employee
Databricks Employee

Thanks for the explanation, the tasks execution history is normally associated to the job id and job run that is being used, as you are moving your task from one job to another one there is no way to keep this execution history

 

jkb7
New Contributor III

This sounds like a very naive design. Where/How could I raise an issue on this behavior with a request to improve the traceability on task execution history?

Walter_C
Databricks Employee
Databricks Employee

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group