topic Re: How do we create a job cluster in Databricks Asset Bundles for use across different jobs? in Data Engineering

How do we create a job cluster in Databricks Asset Bundles for use across different jobs?

oakhill — Wed, 04 Dec 2024 23:31:08 GMT

When developing jobs on DABs, we use new_cluster to create a cluster for a particular job. I think it's a lot of lines and YAML when what I really need is a "small cluster" and "big cluster" to reference for certain kind of jobs. Tags would be on the job and get propagated to the cluster.

According to the dab-docs, we can create a cluster under resources: clusters: but surely we are not meant to use all-purpose clusters as a job cluster?

See the example here: https://docs.databricks.com/en/dev-tools/bundles/resources.html#cluster

Re: How do we create a job cluster in Databricks Asset Bundles for use across different jobs?

filipniziol — Thu, 05 Dec 2024 08:56:11 GMT

Hi @oakhill ,

You can specify you job cluster configuration in your variables:

variables: small_cluster_id: description: "The small cluster with 2 workers used by the jobs" type: complex default: spark_version: "15.4.x-scala2.12" node_type_id: "Standard_D4ds_v5" num_workers: 2

Now you can specify using this cluster in your jobs:

resources: jobs: my_job: name: my_job job_clusters: - job_cluster_key: small_cluster new_cluster: ${var.small_cluster_id}

Re: How do we create a job cluster in Databricks Asset Bundles for use across different jobs?

jkb7 — Mon, 27 Jan 2025 08:41:51 GMT

Nice, this method allows to reduce the redundancy to

job_clusters: - job_cluster_key: small_cluster new_cluster: ${var.small_cluster_id}

which has to be repeated within each job definition. But this redundancy is still a lot! How can we define a job_cluster exactly once and refer to it by name / id? In other words, what is the job-cluster equivalent to the top-level resource cluster?

Re: How do we create a job cluster in Databricks Asset Bundles for use across different jobs?

saurabh18cs — Mon, 27 Jan 2025 11:19:21 GMT

job_clusters:
- job_cluster_key: small_cluster
new_cluster:
spark_version: "7.3.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
autotermination_minutes: 20
- job_cluster_key: large_cluster
new_cluster:
spark_version: "7.3.x-scala2.12"
node_type_id: "i3.2xlarge"
num_workers: 10
autotermination_minutes: 20

tasks:
- task_key: task1
job_cluster_key: small_cluster
notebook_task:
notebook_path: "/Users/your_user/task1_notebook"
- task_key: task2
job_cluster_key: large_cluster
notebook_task:
notebook_path: "/Users/your_user/task2_notebook"
- task_key: task3
job_cluster_key: small_cluster
notebook_task:
notebook_path: "/Users/your_user/task3_notebook"