Databricks Community

jkb7 · a week ago

We are using Databricks Asset Bundles (DAB) to orchestrate multiple workflow jobs, each containing multiple tasks.
The execution schedules is managed on the job level, i.e., all tasks within a job start together.
We often face the issue of rescheduling, i.e., we want to move an existing task, which already has a history of task executions / job runs. To reschedule a task, we might move it from one job to another. By moving the task from one job to another, we loose the tasks execution history. This is a problem for us, as we would like to understand the history of job executions as part of our logging / traceability of what happened in the past.

The question is: How is the task execution history associated with the task? Which properties of the task should we keep constant to keep the history associated with the task? There are task IDs, I guess? How can we make use of them?

Cheers
Julian

Walter_C · a week ago

Thanks for the explanation, the tasks execution history is normally associated to the job id and job run that is being used, as you are moving your task from one job to another one there is no way to keep this execution history

View solution in original post

Walter_C · a week ago

Hello Julian,

Is there any particular reason on why you cannot just change the schedule time of the job based on your needs instead of creating a new job?

jkb7 · a week ago

Thank you for your response! 🙂

We have multiple hundreds of tasks (might become thousands) and we usually tend to group them to a smaller number of jobs. So one job aggregates multiple tasks.

We regularly find ourselves in the situation that a given task which currently is part of job A,
would be better off being part of job B.

However, we do not want to change the schedule of all the other tasks which are part of job A (the job in which our given task had originally been part of.)
We do not want to change the schedules of job A and job B, but just want to move the given task between the jobs. Or sometimes, if we want to schedule the given task at a new schedule which is not yet implemented in terms of an existing job, we might want to create a new job and move the given task to this new job.

jkb7 · a week ago

Shorter: I don't want to change the schedule of the other tasks within the same job.

Walter_C · a week ago

Thanks for the explanation, the tasks execution history is normally associated to the job id and job run that is being used, as you are moving your task from one job to another one there is no way to keep this execution history

jkb7 · a week ago

This sounds like a very naive design. Where/How could I raise an issue on this behavior with a request to improve the traceability on task execution history?