Databricks Community

devpdi · ‎09-05-2024

Hello,

I am facing an issue with my workflow.

I have a job (name it main job) that, among others, runs 5 concurrent tasks, which are defined as jobs (not notebooks).

Each of these jobs is identical to the others (name them sub-job-1), with the only difference being the job parameters.

Each sub-job1, among others, runs in a similar way, reusing a sub-job-2 with different parameters

The structure looks like in the attached picture.

I am re-using the jobs more like a pipeline-task template and rather than different jobs.

What I want to do is run the whole pipeline under the same cluster so that each job-task does not have to wait/block and also because I want to manage resources more efficiently. It would be very time-consuming and error-prone to expand these tasks manually because they are very long and are composed of more than 100 tasks.

Is there a way to do this? Either via specifying clusters on job tasks or defining task templates that are reusable but are not considered jobs?