Hello Databricks Community!
I'm working on a project where I need to deploy jobs in a specific sequence using Databricks Asset Bundles. Some of my jobs (let's call them coordination jobs) depend on other jobs (base jobs) and need to look up their job IDs. I'm struggling to find the best way to ensure that the base jobs are deployed first so that the coordination jobs can successfully perform their variable lookups.
Here's a simplified version of my databricks.yml file:
bundle:
name: my_project
variables:
workspace_path:
value: /Workspace/my_project/files/src
# Cluster lookups
cluster_large:
description: Large Cluster
lookup:
cluster: "large-cluster"
cluster_small:
description: Small Cluster
lookup:
cluster: "small-cluster"
# Job lookups
base_job_1:
lookup:
job: base_job_1
base_job_2:
lookup:
job: base_job_2
base_job_3:
lookup:
job: base_job_3
include:
- ./resources/workflows/base_jobs/*.yml
- ./resources/workflows/coordination_jobs/*.yml
targets:
dev:
mode: development
default: true
variables:
deploy_env: dev
workspace:
host: https://my-dev-workspace.cloud.databricks.com/
prod:
mode: production
default: false
variables:
deploy_env: prod
workspace:
host: https://my-prod-workspace.cloud.databricks.com/
root_path: /Workspace/${bundle.name}/
And here's an example of one of my coordination jobs (coordination_job_1.yml):
resources:
jobs:
coordination_job_1:
name: coordination_job_1
tasks:
- task_key: run_base_job_1
run_job_task:
job_id: ${var.base_job_1}
- task_key: run_base_job_2
depends_on:
- task_key: run_base_job_1
run_job_task:
job_id: ${var.base_job_2}
- task_key: final_task
depends_on:
- task_key: run_base_job_2
notebook_task:
notebook_path: ${var.workspace_path}/final_task
source: WORKSPACE
existing_cluster_id: ${var.cluster_small}
tags:
deploy_env: ${var.deploy_env}
queue:
enabled: false
My questions are:
How can I ensure that the base jobs are deployed before the coordination jobs?
Is there a way to structure my databricks.yml or use Databricks CLI commands to enforce this deployment order?
Are there best practices for managing these kinds of job dependencies in Databricks Asset Bundles?
Any insights or suggestions would be greatly appreciated! Thank you in advance for your help.