cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to reuse a cluster with Databricks Asset bundles

sumitdesai
New Contributor II

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual new clusters while defining a job. How can we reuse the clusters?

1 REPLY 1

Wojciech_BUK
Valued Contributor III

Hello,
Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is completed. You can define a cluster on the job level or for individual tasks. You can use the same cluster within a job, so multiple tasks can be run on the same cluster.

If you want, you can share a cluster between jobs, BUT! it will be an All Purpose Cluster that costs 2x more DBUs. It is not recommended to use all-purpose clusters for jobs unless you have very specific needs.

Asset bundles are not very well documented yet (in public preview), so you can always refer to the API documentation:

API Documentation Link

 

In the documentation, there is an example where the script uses:

existing_cluster_id

This ID is ID of All Purpose Cluster that you can find in JSON definition of a cluster.

Wojciech_BUK_1-1709461581081.png

so in YAML it will be:

 

 

  tasks:
        - task_key: notebook_task
          existing_cluster_id: Id of your existing Cluster

 

 

But as I mentioned, it is recommended to use Job Clusters. You can define multiple job clusters, for example, 2 clusters:

 

 

      job_clusters:
        - job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
          new_cluster:
            # Cluster settings.

 

 

And use them WITHIN the job by assigning job_cluster_key to task specifications.

In this section of documentation you can see how you can do it:

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-...

 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!