cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Version & Deploy Databricks Workflows with Azure DevOps (CI/CD)?

mkEngineer
New Contributor III

Hi everyone,

I’m trying to set up versioning and CI/CD for my Databricks workflows using Azure DevOps and Git. While I’ve successfully versioned notebooks in a Git repo, I’m struggling with handling workflows (which define orchestration, dependencies, schema, etc.).

How can I properly version and deploy Databricks workflows across different environments (Dev, Test, Prod) using Azure DevOps?

Thanks in advance!

3 REPLIES 3

mkEngineer
New Contributor III

@Alberto_Umana and @nicole_lu_PM maybe you have a clue? Would DABs be useful in this case? 

 

mkEngineer
New Contributor III

As of now, my current approach is to manually copy/paste YAMLs across workspaces and version them using Git/Azure DevOps by saving them as DBFS files. The CD process is then handled using Databricks DBFS File Deployment by Data Thirst Ltd.

While this works, I’m still looking for a more automated and scalable solution. Has anyone found a better way to manage Databricks workflow versioning and deployment in a CI/CD setup? Would love to hear your insights!

mark_ott
Databricks Employee
Databricks Employee

To properly version and deploy Databricks workflows—including orchestration, dependencies, and environment management—across Dev, Test, and Prod using Azure DevOps, follow these best practices and patterns:

Versioning Databricks Workflows

  • Store Databricks notebooks, scripts, and workflow configuration (such as Databricks Asset Bundles with databricks.yml) in a Git repository.

  • Use structured folders: separate directories for notebooks, scripts, configuration files, and bundle definitions.

  • Commit each change (workflow logic, dependencies, orchestration configs) to version control and use Git branches for environment-specific workflows and stable releases.​

Automated CI/CD with Azure DevOps

Core Approach:

  • Use Databricks Asset Bundles (DABs) and the Databricks CLI for defining and deploying jobs and their orchestration in code.

  • Set up two separate pipelines in Azure DevOps: a build pipeline (prepares artifacts) and a release pipeline (deploys and runs workflows).​

Steps for End-to-End CI/CD

1. Organize Assets in Git

  • Put all notebooks, custom scripts/packages, and databricks.yml in your repository.

  • The databricks.yml bundle describes jobs, dependencies, clusters, variables, and targets (for Dev, Test, Prod) in a declarative YAML format.

2. Define Build Pipeline

  • Pipeline pulls latest artifacts from Git, runs tests, packages any libraries (like Python wheels), and creates a zipped deployment artifact.

  • Example YAML task (simplified):

    text
    trigger: - release pool: vmImage: ubuntu-latest steps: - checkout: self - script: | # Custom logic to prepare artifacts mkdir -p $(Build.ArtifactStagingDirectory) cp -R * $(Build.ArtifactStagingDirectory)/ displayName: "Prepare Artifacts" - task: PublishBuildArtifacts@1 inputs: ArtifactName: 'DatabricksBuild'
  • Store pipeline YAML with the repo for versioning.​

3. Define Release Pipeline

  • Unpacks and deploys the artifact using the Databricks CLI.

  • Uses environment variables to switch between deployment targets; for example:

    text
    databricks bundle deploy -t dev databricks bundle deploy -t test databricks bundle deploy -t prod
  • Executes Databricks jobs or job runs after deployment, for validation or smoke tests.

  • Securely configure service principal/application credentials for Databricks API access.​

4. Promote Artifacts Between Environments

  • Artifacts tested in Dev are promoted to Test and then Prod by reusing the same release pipeline with different target parameters, ensuring consistency and immutability.​

Workflow Configuration Example (databricks.yml)

  • Example bundle for jobs/dependencies:

    text
    bundle: name: my-workflow targets: dev: mode: development workspace: host: https://adb-xxx.azuredatabricks.net prod: mode: production workspace: host: https://adb-yyy.azuredatabricks.net resources: jobs: my-job: name: My Workflow Job tasks: - notebook_task: notebook_path: ./notebooks/my_notebook.py new_cluster: spark_version: "13.3.x-scala2.12" node_type_id: Standard_DS3_v2
  • Switch deployment environment via CLI/DevOps pipeline variable: databricks bundle deploy -t prod.​

Key Tools and Tips

  • Use Databricks CLI in non-interactive mode within pipelines for deployments and validation.

  • Parameterize cluster/node/job settings via YAML for each environment.

  • Test locally with the CLI and validate your workflow syntax before running through the pipeline.

  • Store secrets (Azure Databricks token, service principal credentials) securely using Azure DevOps secrets.​

This approach offers clear, reproducible promotion of workflow definitions, orchestration, dependencies, and environment settings, fully automated within Azure DevOps and under version control.​

For a detailed Microsoft walkthrough with sample files and pipeline YAML, see the official documentation.​