Databricks Community

sumitdesai · ‎03-01-2024

I am using Databricks asset bundles as an IAC tool with databricks. I want to create a cluster using DAB and then reuse the same cluster in multiple jobs. I can not find an example for this. Whatever examples I found out have all specified individual new clusters while defining a job. How can we reuse the clusters?

Wojciech_BUK · ‎03-03-2024

Hello,
Jobs are specific in Databricks; a job definition also contains the cluster definition because when you run a job, a new cluster is created based on the cluster specification you provided for the job, and it exists only until the job is completed. You can define a cluster on the job level or for individual tasks. You can use the same cluster within a job, so multiple tasks can be run on the same cluster.

If you want, you can share a cluster between jobs, BUT! it will be an All Purpose Cluster that costs 2x more DBUs. It is not recommended to use all-purpose clusters for jobs unless you have very specific needs.

Asset bundles are not very well documented yet (in public preview), so you can always refer to the API documentation:

API Documentation Link

In the documentation, there is an example where the script uses:

existing_cluster_id

This ID is ID of All Purpose Cluster that you can find in JSON definition of a cluster.

so in YAML it will be:

  tasks:
        - task_key: notebook_task
          existing_cluster_id: Id of your existing Cluster

But as I mentioned, it is recommended to use Job Clusters. You can define multiple job clusters, for example, 2 clusters:

      job_clusters:
        - job_cluster_key: <some-unique-programmatic-identifier-for-this-key>
          new_cluster:
            # Cluster settings.

And use them WITHIN the job by assigning job_cluster_key to task specifications.

In this section of documentation you can see how you can do it:

https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/how-to/use-bundles-with-jobs#step-...

felix_ · ‎11-14-2024

Hi, would it also be possible to reuse the same job cluster for multiple "Run Job" Tasks?

sumitdesai · ‎11-14-2024

I can think of a way if you are fine with running those jobs one after another. You can create a new job and add multiple tasks one corresponding to each job and chain them together. You will need to configure just one job cluster and same cluster should get reused by all tasks

Databricks Community

How to reuse a cluster with Databricks Asset bundles

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon