cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to reuse a cluster with Databricks Asset bundles

sumitdesai
New Contributor II

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual new clusters while defining a job. How can we reuse the clusters?

1 REPLY 1

Wojciech_BUK
Contributor III

Hello,
Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is completed. You can define a cluster on the job level or for individual tasks. You can use the same cluster within a job, so multiple tasks can be run on the same cluster.

If you want, you can share a cluster between jobs, BUT! it will be an All Purpose Cluster that costs 2x more DBUs. It is not recommended to use all-purpose clusters for jobs unless you have very specific needs.

Asset bundles are not very well documented yet (in public preview), so you can always refer to the API documentation:

API Documentation Link

 

In the documentation, there is an example where the script uses:

existing_cluster_id

This ID is ID of All Purpose Cluster that you can find in JSON definition of a cluster.

Wojciech_BUK_1-1709461581081.png

so in YAML it will be:

 

 

  tasks:
        - task_key: notebook_task
          existing_cluster_id: Id of your existing Cluster

 

 

But as I mentioned, it is recommended to use Job Clusters. You can define multiple job clusters, for example, 2 clusters:

 

 

      job_clusters:
        - job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
          new_cluster:
            # Cluster settings.

 

 

And use them WITHIN the job by assigning job_cluster_key to task specifications.

In this section of documentation you can see how you can do it:

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-...

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.