โ10-14-2024 11:23 PM - edited โ10-14-2024 11:29 PM
Hi All
I am having situation where I wanted to run job as continuous trigger by using job cluster, cluster terminating and re-creating in every run within continuous trigger.
I just wanted two know if we have any option where I can use same job cluster for all time till our continuous trigger is running.
โ10-15-2024 12:08 AM
Hey @Ajay-Pandey , for my better understanding, why do You want to run those job runs using the same job cluster?
โ10-15-2024 02:03 AM
Hi @radothede Using same job cluster reduces the uptime that will take to stop and re-create new cluster
โ10-15-2024 12:35 AM
The Databricks Job cluster for continuous runs is a powerful tool designed to automate the execution of your jobs seamlessly. Much like the thrill and precision required to navigate through the challenging levels of Moto X3M, managing continuous jobs in Databricks requires agility and efficient handling of data tasks.
โ10-15-2024 05:55 AM
Can you please refer to this if this helps - https://community.databricks.com/t5/data-engineering/databricks-job-scheduling-continuous-mode/m-p/3...
โ10-16-2024 10:34 AM
@Ajay-Pandey cant we achieve the similar functionalities with the help of cluster Pools , why don't you try cluster pools.
โ10-16-2024 07:33 PM
@Rishabh-Pandey Pool will only help to reduce the autoscaling time but it's come with more costly.
We cannot use pooling due to cost constraints
โ12-28-2024 09:08 PM
Did you find any solution? And if so, how is the cost calculated in DBU, is it a 24/7 cost?
Wednesday
I understand the desire to keep the cluster running for the entire duration. As it stands, recreating the cluster each time is the standard behavior, but that doesn't mean there aren't workarounds. Perhaps exploring more persistent cluster policies might offer a little improvement in startup time? It kind of reminds me of the persistent feeling of endless running, like trying to achieve a high score in Run 3
Wednesday
Hi @Ajay-Pandey
only solution for you
1. Create an all-purpose cluster called for example:
continuous-job-cluster and Disable auto-termination or set it to a large value.
2. Configure job to use existing_cluster_id
In Jobs UI or DAB YAML:
existing_cluster_id: <cluster-id-of-continuous-job-cluster>
Now:
a. The cluster stays alive
b. Your continuous trigger reuses the same compute
c. No cold starts
d. No cluster recreation
3. For streaming workloads
Instead of continuous jobs, write your notebook as a streaming query:
spark.readStream โ writeStream.start() โ awaitTermination()
Run it on the existing cluster and let Spark manage the lifecycle.
This is how Databricks expects continuous pipelines to run.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now