cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Scheduling Jobs with Multiple Git Repos on a Single Job Cluster

SOlivero
New Contributor III

Hi,

I'm trying to create a scheduled job that runs notebooks from three different repos. However, since a job can only be associated with one repo, I've had to create three separate jobs and a master job that triggers them sequentially.

This setup works, except that job clusters are limited to one job each. This is problematic because starting a new cluster for each job delays execution, and I need the job to run as quickly as possible.

How can I either create a job with multiple repos using a single cluster, or configure multiple jobs to share the same job cluster?

1 REPLY 1

Brahmareddy
Honored Contributor II

Hi @SOlivero ,

Try configuring a shared all-purpose cluster and set each job to use this existing cluster rather than creating new job-specific clusters, ensuring the cluster stays warm and avoiding startup delays. Another option is to restructure your master job to run the notebooks sequentially using task dependencies within a single job definition, allowing all tasks to share the same cluster. This setup should help improve speed and efficiency, making your jobs run faster without the need for multiple clusters.

Hope this helps!

Regards,

Brahma