How do we create a job cluster in Databricks Asset Bundles for use across different jobs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2024 03:31 PM
When developing jobs on DABs, we use new_cluster to create a cluster for a particular job. I think it's a lot of lines and YAML when what I really need is a "small cluster" and "big cluster" to reference for certain kind of jobs. Tags would be on the job and get propagated to the cluster.
According to the dab-docs, we can create a cluster under resources: clusters: but surely we are not meant to use all-purpose clusters as a job cluster?
See the example here: https://docs.databricks.com/en/dev-tools/bundles/resources.html#cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 12:56 AM
Hi @oakhill ,
You can specify you job cluster configuration in your variables:
variables:
small_cluster_id:
description: "The small cluster with 2 workers used by the jobs"
type: complex
default:
spark_version: "15.4.x-scala2.12"
node_type_id: "Standard_D4ds_v5"
num_workers: 2Now you can specify using this cluster in your jobs:
resources:
jobs:
my_job:
name: my_job
job_clusters:
- job_cluster_key: small_cluster
new_cluster: ${var.small_cluster_id}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2025 12:41 AM
Nice, this method allows to reduce the redundancy to
job_clusters:
- job_cluster_key: small_cluster
new_cluster: ${var.small_cluster_id} which has to be repeated within each job definition. But this redundancy is still a lot! How can we define a job_cluster exactly once and refer to it by name / id? In other words, what is the job-cluster equivalent to the top-level resource cluster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2025 03:19 AM
job_clusters:
- job_cluster_key: small_cluster
new_cluster:
spark_version: "7.3.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
autotermination_minutes: 20
- job_cluster_key: large_cluster
new_cluster:
spark_version: "7.3.x-scala2.12"
node_type_id: "i3.2xlarge"
num_workers: 10
autotermination_minutes: 20
tasks:
- task_key: task1
job_cluster_key: small_cluster
notebook_task:
notebook_path: "/Users/your_user/task1_notebook"
- task_key: task2
job_cluster_key: large_cluster
notebook_task:
notebook_path: "/Users/your_user/task2_notebook"
- task_key: task3
job_cluster_key: small_cluster
notebook_task:
notebook_path: "/Users/your_user/task3_notebook"