cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Re-use jobs as tasks with the same cluster.

devpdi
New Contributor

Hello,

I am facing an issue with my workflow.

I have a job (name it main job) that, among others, runs 5 concurrent tasks, which are defined as jobs (not notebooks).

Each of these jobs is identical to the others (name them sub-job-1), with the only difference being the job parameters.

Each sub-job1, among others, runs in a similar way, reusing a sub-job-2 with different parameters

The structure looks like in the attached picture.

I am re-using the jobs more like a pipeline-task template and rather than different jobs.

What I want to do is run the whole pipeline under the same cluster so that each job-task does not have to wait/block and also because I want to manage resources more efficiently. It would be very time-consuming and error-prone to expand these tasks manually because they are very long and are composed of more than 100 tasks.

Is there a way to do this? Either via specifying clusters on job tasks or defining task templates that are reusable but are not considered jobs?

1 REPLY 1

gchandra
Valued Contributor II

Did you try to create jobs using Databricks Python SDK?

https://github.com/databricks/databricks-sdk-py/tree/main/examples/workspace/jobs

 



~

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group