topic Re: How to reuse a cluster with Databricks Asset bundles in Data Engineering

How to reuse a cluster with Databricks Asset bundles

sumitdesai — Fri, 01 Mar 2024 14:06:00 GMT

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual new clusters while defining a job. How can we reuse the clusters?

Re: How to reuse a cluster with Databricks Asset bundles

Wojciech_BUK — Sun, 03 Mar 2024 10:39:58 GMT

Hello,
Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is completed. You can define a cluster on the job level or for individual tasks. You can use the same cluster within a job, so multiple tasks can be run on the same cluster.

If you want, you can share a cluster between jobs, BUT! it will be an All Purpose Cluster that costs 2x more DBUs. It is not recommended to use all-purpose clusters for jobs unless you have very specific needs.

Asset bundles are not very well documented yet (in public preview), so you can always refer to the API documentation:

API Documentation Link

In the documentation, there is an example where the script uses:

existing_cluster_id

This ID is ID of All Purpose Cluster that you can find in JSON definition of a cluster.

so in YAML it will be:

tasks: - task_key: notebook_task existing_cluster_id: Id of your existing Cluster

But as I mentioned, it is recommended to use Job Clusters. You can define multiple job clusters, for example, 2 clusters:

job_clusters: - job_cluster_key: <some-unique-programmatic-identifier-for-this-key> new_cluster: # Cluster settings.

And use them WITHIN the job by assigning job_cluster_key to task specifications.

In this section of documentation you can see how you can do it:

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-5-add-a-bundle-configuration-file-to-the-project

Re: How to reuse a cluster with Databricks Asset bundles

felix_ — Thu, 14 Nov 2024 09:51:53 GMT

Hi, would it also be possible to reuse the same job cluster for multiple "Run Job" Tasks?

Re: How to reuse a cluster with Databricks Asset bundles

sumitdesai — Fri, 15 Nov 2024 04:23:34 GMT

I can think of a way if you are fine with running those jobs one after another. You can create a new job and add multiple tasks one corresponding to each job and chain them together. You will need to configure just one job cluster and same cluster should get reused by all tasks