cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to reuse a cluster with Databricks Asset bundles

sumitdesai
New Contributor II

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual new clusters while defining a job. How can we reuse the clusters?

3 REPLIES 3

Wojciech_BUK
Valued Contributor III

Hello,
Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is completed. You can define a cluster on the job level or for individual tasks. You can use the same cluster within a job, so multiple tasks can be run on the same cluster.

If you want, you can share a cluster between jobs, BUT! it will be an All Purpose Cluster that costs 2x more DBUs. It is not recommended to use all-purpose clusters for jobs unless you have very specific needs.

Asset bundles are not very well documented yet (in public preview), so you can always refer to the API documentation:

API Documentation Link

 

In the documentation, there is an example where the script uses:

existing_cluster_id

This ID is ID of All Purpose Cluster that you can find in JSON definition of a cluster.

Wojciech_BUK_1-1709461581081.png

so in YAML it will be:

 

 

  tasks:
        - task_key: notebook_task
          existing_cluster_id: Id of your existing Cluster

 

 

But as I mentioned, it is recommended to use Job Clusters. You can define multiple job clusters, for example, 2 clusters:

 

 

      job_clusters:
        - job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
          new_cluster:
            # Cluster settings.

 

 

And use them WITHIN the job by assigning job_cluster_key to task specifications.

In this section of documentation you can see how you can do it:

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-...

 

felix_
New Contributor II

Hi, would it also be possible to reuse the same job cluster for multiple "Run Job" Tasks?

sumitdesai
New Contributor II

I can think of a way if you are fine with running those jobs one after another. You can create a new job and add multiple tasks one corresponding to each job and chain them together. You will need to configure just one job cluster and same cluster should get reused by all tasks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group