Databricks Community

devpdi · ‎09-05-2024

Hello,

I am facing an issue with my workflow.

I have a job (name it main job) that, among others, runs 5 concurrent tasks, which are defined as jobs (not notebooks).

Each of these jobs is identical to the others (name them sub-job-1), with the only difference being the job parameters.

Each sub-job1, among others, runs in a similar way, reusing a sub-job-2 with different parameters

The structure looks like in the attached picture.

I am re-using the jobs more like a pipeline-task template and rather than different jobs.

What I want to do is run the whole pipeline under the same cluster so that each job-task does not have to wait/block and also because I want to manage resources more efficiently. It would be very time-consuming and error-prone to expand these tasks manually because they are very long and are composed of more than 100 tasks.

Is there a way to do this? Either via specifying clusters on job tasks or defining task templates that are reusable but are not considered jobs?

gchandra · ‎09-21-2024

Did you try to create jobs using Databricks Python SDK?

https://github.com/databricks/databricks-sdk-py/tree/main/examples/workspace/jobs

~

holychs · ‎12-05-2024

Were you able to find any solution to the problem?

I am also having the similar use-case where I need to run multiple run_job_task and everytime it is spinning up new cluster of its own that is defined in child job.

I am not able to find any relevant solution for the described problem in the databricks python sdk.