cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Job cluster for continuous run

Ajay-Pandey
Databricks MVP

Hi All

I am having situation where I wanted to run job as continuous trigger by using job cluster, cluster terminating and re-creating in every run within continuous trigger.

I just wanted two know if we have any option where I can use same job cluster for all time till our continuous trigger is running.

AjayPandey_0-1728973783760.png

 

Ajay Kumar Pandey
9 REPLIES 9

radothede
Valued Contributor II

Hey @Ajay-Pandey , for my better understanding, why do You want to run those job runs using the same job cluster?

Hi @radothede Using same job cluster reduces the uptime that will take to stop and re-create new cluster

Ajay Kumar Pandey

JohnMartin
New Contributor II

The Databricks Job cluster for continuous runs is a powerful tool designed to automate the execution of your jobs seamlessly. Much like the thrill and precision required to navigate through the challenging levels of Moto X3M, managing continuous jobs in Databricks requires agility and efficient handling of data tasks.

Veilraj
New Contributor III

Rishabh-Pandey
Databricks MVP

@Ajay-Pandey cant we achieve the similar functionalities with the help of cluster Pools , why don't you try cluster pools.

Rishabh Pandey

@Rishabh-Pandey Pool will only help to reduce the autoscaling time but it's come with more costly.
We cannot use pooling due to cost constraints

Ajay Kumar Pandey

Did you find any solution? And if so, how is the cost calculated in DBU, is it a 24/7 cost?

wolffi99
New Contributor

 I understand the desire to keep the cluster running for the entire duration. As it stands, recreating the cluster each time is the standard behavior, but that doesn't mean there aren't workarounds. Perhaps exploring more persistent cluster policies might offer a little improvement in startup time? It kind of reminds me of the persistent feeling of endless running, like trying to achieve a high score in Run 3

mukul1409
New Contributor II

Hi @Ajay-Pandey 
only solution for you 

1. Create an all-purpose cluster called for example:
     continuous-job-cluster and Disable auto-termination or set it to a large value.

2. Configure job to use existing_cluster_id
   
In Jobs UI or DAB YAML:
   existing_cluster_id: <cluster-id-of-continuous-job-cluster>
   Now:
    a. The cluster stays alive
    b. Your continuous trigger reuses the same compute
    c. No cold starts
    d. No cluster recreation

3. For streaming workloads 
Instead of continuous jobs, write your notebook as a streaming query:
spark.readStream โ†’ writeStream.start() โ†’ awaitTermination()
Run it on the existing cluster and let Spark manage the lifecycle.
This is how Databricks expects continuous pipelines to run.

Mukul Chauhan