cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Sequencing Job Deployments with Databricks Asset Bundles

Stephanos
New Contributor

Hello Databricks Community!
I'm working on a project where I need to deploy jobs in a specific sequence using Databricks Asset Bundles. Some of my jobs (let's call them coordination jobs) depend on other jobs (base jobs) and need to look up their job IDs. I'm struggling to find the best way to ensure that the base jobs are deployed first so that the coordination jobs can successfully perform their variable lookups.
Here's a simplified version of my databricks.yml file:

 

 

bundle:
  name: my_project

variables:
  workspace_path: 
    value: /Workspace/my_project/files/src
  
  # Cluster lookups
  cluster_large:
    description: Large Cluster
    lookup:
      cluster: "large-cluster"
  cluster_small:
    description: Small Cluster
    lookup:
      cluster: "small-cluster"

  # Job lookups
  base_job_1:
    lookup:
      job: base_job_1
  base_job_2:
    lookup:
      job: base_job_2
  base_job_3:
    lookup:
      job: base_job_3

include:
  - ./resources/workflows/base_jobs/*.yml
  - ./resources/workflows/coordination_jobs/*.yml

targets:
  dev:
    mode: development
    default: true
    variables:
      deploy_env: dev
    workspace:
      host: https://my-dev-workspace.cloud.databricks.com/
  
  prod:
    mode: production
    default: false
    variables:
      deploy_env: prod
    workspace:
      host: https://my-prod-workspace.cloud.databricks.com/
      root_path: /Workspace/${bundle.name}/

 

 

And here's an example of one of my coordination jobs (coordination_job_1.yml):

 

 

resources:
  jobs:
    coordination_job_1:
      name: coordination_job_1
      tasks:
        - task_key: run_base_job_1
          run_job_task:
            job_id: ${var.base_job_1}
        - task_key: run_base_job_2
          depends_on:
            - task_key: run_base_job_1
          run_job_task:
            job_id: ${var.base_job_2}
        - task_key: final_task
          depends_on:
            - task_key: run_base_job_2
          notebook_task:
            notebook_path: ${var.workspace_path}/final_task
            source: WORKSPACE
          existing_cluster_id: ${var.cluster_small}
      tags:
        deploy_env: ${var.deploy_env}
      queue:
        enabled: false

 

 

My questions are:

How can I ensure that the base jobs are deployed before the coordination jobs?
Is there a way to structure my databricks.yml or use Databricks CLI commands to enforce this deployment order?
Are there best practices for managing these kinds of job dependencies in Databricks Asset Bundles?

Any insights or suggestions would be greatly appreciated! Thank you in advance for your help.

1 REPLY 1

MohcineRouessi
New Contributor II

Hey Steph, Have you found anything here please ? I'm currently stuck here, trying to achieve the same thing 🙏

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group