cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do we create a job cluster in Databricks Asset Bundles for use across different jobs?

oakhill
New Contributor III

When developing jobs on DABs, we use new_cluster to create a cluster for a particular job. I think it's a lot of lines and YAML when what I really need is a "small cluster" and "big cluster" to reference for certain kind of jobs. Tags would be on the job and get propagated to the cluster.

According to the dab-docs, we can create a cluster under resources: clusters: but surely we are not meant to use all-purpose clusters as a job cluster?

See the example here: https://docs.databricks.com/en/dev-tools/bundles/resources.html#cluster

 

3 REPLIES 3

filipniziol
Esteemed Contributor

Hi @oakhill ,

You can specify you job cluster configuration in your variables:

variables:
  small_cluster_id:
    description: "The small cluster with 2 workers used by the jobs"
    type: complex
    default:
      spark_version: "15.4.x-scala2.12"
      node_type_id: "Standard_D4ds_v5"
      num_workers: 2

Now you can specify using this cluster in your jobs:

resources:
  jobs:
    my_job:
      name: my_job

      job_clusters:
        - job_cluster_key: small_cluster
          new_cluster: ${var.small_cluster_id}    

jkb7
New Contributor III

Nice, this method allows to reduce the redundancy to 

      job_clusters:
        - job_cluster_key: small_cluster
          new_cluster: ${var.small_cluster_id}  

which has to be repeated within each job definition. But this redundancy is still a lot! How can we define a job_cluster exactly once and refer to it by name / id? In other words, what is the job-cluster equivalent to the top-level resource cluster?

saurabh18cs
Honored Contributor

job_clusters:
- job_cluster_key: small_cluster
new_cluster:
spark_version: "7.3.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
autotermination_minutes: 20
- job_cluster_key: large_cluster
new_cluster:
spark_version: "7.3.x-scala2.12"
node_type_id: "i3.2xlarge"
num_workers: 10
autotermination_minutes: 20

tasks:
- task_key: task1
job_cluster_key: small_cluster
notebook_task:
notebook_path: "/Users/your_user/task1_notebook"
- task_key: task2
job_cluster_key: large_cluster
notebook_task:
notebook_path: "/Users/your_user/task2_notebook"
- task_key: task3
job_cluster_key: small_cluster
notebook_task:
notebook_path: "/Users/your_user/task3_notebook"

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now