How do we create a job cluster in Databricks Asset Bundles for use across different jobs?

oakhill
New Contributor III

When developing jobs on DABs, we use new_cluster to create a cluster for a particular job. I think it's a lot of lines and YAML when what I really need is a "small cluster" and "big cluster" to reference for certain kind of jobs. Tags would be on the job and get propagated to the cluster.

According to the dab-docs, we can create a cluster under resources: clusters: but surely we are not meant to use all-purpose clusters as a job cluster?

See the example here: https://docs.databricks.com/en/dev-tools/bundles/resources.html#cluster

 

filipniziol
Esteemed Contributor

Hi @oakhill ,

You can specify you job cluster configuration in your variables:

variables:
  small_cluster_id:
    description: "The small cluster with 2 workers used by the jobs"
    type: complex
    default:
      spark_version: "15.4.x-scala2.12"
      node_type_id: "Standard_D4ds_v5"
      num_workers: 2

Now you can specify using this cluster in your jobs:

resources:
  jobs:
    my_job:
      name: my_job

      job_clusters:
        - job_cluster_key: small_cluster
          new_cluster: ${var.small_cluster_id}    

jkb7
New Contributor III

Nice, this method allows to reduce the redundancy to 

      job_clusters:
        - job_cluster_key: small_cluster
          new_cluster: ${var.small_cluster_id}  

which has to be repeated within each job definition. But this redundancy is still a lot! How can we define a job_cluster exactly once and refer to it by name / id? In other words, what is the job-cluster equivalent to the top-level resource cluster?

saurabh18cs
Honored Contributor III

job_clusters:
- job_cluster_key: small_cluster
new_cluster:
spark_version: "7.3.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
autotermination_minutes: 20
- job_cluster_key: large_cluster
new_cluster:
spark_version: "7.3.x-scala2.12"
node_type_id: "i3.2xlarge"
num_workers: 10
autotermination_minutes: 20

tasks:
- task_key: task1
job_cluster_key: small_cluster
notebook_task:
notebook_path: "/Users/your_user/task1_notebook"
- task_key: task2
job_cluster_key: large_cluster
notebook_task:
notebook_path: "/Users/your_user/task2_notebook"
- task_key: task3
job_cluster_key: small_cluster
notebook_task:
notebook_path: "/Users/your_user/task3_notebook"