Establishing a trusted Continuous Integration/Continuous Deployment (CI/CD) process is crucial for effectively managing the lifecycle of your data and AI workloads in Azure Databricks. However, with numerous setup options and a constantly evolving platform, the big question is, "Where do I even start?" This comprehensive blog cuts through the complexity to outline a recommended CI/CD approach for Azure Databricks.
We focus on the latest best practices that support strong governance with minimal administrative overhead, even when supporting multiple teams in a shared infrastructure. It's tailored for developers who enjoy working in the Databricks workspace UI, though it also fully supports IDEs like Visual Studio Code.
The recommended approach uses Declarative Automation Bundles (DABs; formerly known as Databricks Asset Bundles) for deployment. Bundles seamlessly integrate with Databricks Git Folders for version control, Azure DevOps for automation, and Azure DevOps service connections for Databricks connectivity. At the end of the blog, I compare the DevOps service connection and Databricks OIDC approaches for integrating DevOps and Databricks for your reference.
Finally, we’ll follow a simple but effective trunk-based branching strategy in DevOps. Staging will be deployed automatically after feature branch pull requests are approved and merged into main. Production deployment from main will be contingent on a successful staging deployment and will require an additional approval step, which we will implement.
This blog covers a wide range of topics, but by guiding you toward the optimal choices, it will save you significant time and effort in the long run. The content is divided into three parts:
Every organization needs to decide what level of separation and organization is required for Databricks workspaces, deployment environments, Unity Catalog, DevOps projects, and Git repositories. There is no one right answer for every organization, and we can only touch on the considerations before moving on to the article's topic.
|
Topic |
Consideration |
For This Article |
|
Environments |
Three is common, but some may use two, and others four or more. |
We’ll use: dev, staging, and prod |
|
Databricks Workspaces |
Generally, workspaces align with environments. Larger organizations may also organize business units in separate workspaces. An advantage of three workspaces is that the workspace admin can freely enable preview features and other settings in the “dev” workspace and be certain not to affect the staging and prod workloads. |
We’ll show how multiple independent teams can use two workspaces (bu1-dev and bu1-prod) with reasonable governance. Staging will deploy to a secure folder in the dev workspace. You can follow along and use 3 workspaces with dev, staging, and prod, each in its own workspace. |
|
Catalogs |
There are many strategies for organizing your data in Unity Catalog. |
We’ll name our catalogs bu1_dev, bu1_stg, and bu1_prd, demonstrating how independent business units can operate within a single Unity Catalog metastore. |
|
Azure DevOps Projects |
Smaller organizations use a single project to simplify setup, while larger organizations may require multiple projects to have hard boundaries between code repositories and approvers. |
We’ll use a project called bu1-team1, allowing for independent teams (e.g., HR, Legal, Finance) within a “corporate” BU to operate securely within the shared workspaces alongside other BU teams. |
|
Git Repository |
There are pros and cons for a single “mono repo” versus multiple team or domain-based repos. |
Since we have already selected a team-based DevOps project, we’ll use its single default repo. |
Note: I encourage you to follow the example and organizational strategy outlined in this blog to see how the whole process works together effectively. Then you can consider what you learned, how your organization’s governance and organization needs may differ, and make the necessary changes. Having made these choices, let’s look over our organizational strategy visually.
Within our “corporate” organization (BU1), Team 2 handles HR data, and everything else is handled by Team 1. These two teams will work independently in our shared bu1-dev and bu1-prod workspaces. For this to work well, it is important to note that one or more individuals must be trusted by both teams to serve as workspace Administrators. If this can be arranged, infrastructure management is simplified; if not, teams 1 and 2 should be separated into their own dev/prod workspaces.
Down the road, another division (BU2) could be acquired that shares the same Entra ID directory but elects to have separate workspaces and DevOps infrastructure. That’s okay, and it's the reason we’ll prefix our catalog names in Unity Catalog (e.g., bu1_dev, bu2_dev, …).
We’ll align with the “mono repo” approach within each DevOps project and further organize our code using Declarative Automation Bundles. In this blog, we’ll use a single “Hello World” bundle to illustrate how the recommended development and CI/CD processes work. In practice, you will have many bundles for a given data domain (e.g., finance), and these can all be versioned in the same repo. We’ll use a single DevOps pipeline for staging and production deployments, but you may choose a different pipeline strategy.
Databricks bundles and Git folders work seamlessly together with Azure DevOps, and set you up for success in your CI/CD process. Let’s jump into it.
This is pretty straightforward and involves creating the team’s Azure DevOps project and initializing the repo. You do not need to create any branches–we will use the built-in main branch and create our feature branches within Databricks using the built-in Git integration UI.
Create your project, initialize the main branch, and then go to the repo and copy the repository URL to your clipboard for the next step.
In the Databricks workspace, each developer creates their own Azure DevOps (MSFT Entra) linked account credential, allowing them to authenticate to the Git repository. Do this now in your “dev” workspace on the Settings > User > Linked accounts page. You do not need to do this in your prod workspace because there is no Git integration required there (DevOps will push the code to prod).
In their Home directory, each developer creates their own Git folder that points to the team’s repo. This step is done by each developer only once. Click Create > Git folder.
Paste in the repo URL from your Azure DevOps project (should still be on your clipboard), then click Create Git folder.
You will notice that Git folders have special properties in the Workspace UI, including these two buttons.
Click Open in editor and notice that a new browser tab opens with your Git folder in a new “focused” authoring mode. We’ll discuss this in a moment.
We won’t use this view right now, so close the browser tab that was just opened.
Now, click the Git branch button as shown below.
In the Git integration UI, click Create Branch, assign a name to your feature branch (e.g., features/initial_work_on_hello_world), and then click Create.
Click the “X” button in the top right corner of the Git integration dialog to close it. Using your Microsoft ENTRA credentials, the Databricks Git integration created the branch in your Azure DevOps Repo and switched you to it in the Workspace UI, as shown below.
As of April, 2026, there are two UI previews that affect the behavior of the menus shown in this article. Have your workspace Admin go to Previews and search for “tree”. By default, the new “authoring” context preview (the first one, below) is enabled, and the new tree view for the file browser is not. We recommend enabling both so you can follow along with the article and because they both make working with Git folders and asset bundles easier. Here are the documentation links (as of April, 2026) for reference:
From your Git folder, click Create and then choose Bundle.
There are different approaches to organizing your development copy of the bundles in your Git folder. A flat list of bundles, as shown below, works fine. You can also create folders under the Git folder and organize your bundles within them.
For the purposes of this blog, enter a name like hello_world for the Bundle name and choose the Empty project option. After clicking Create, you will see a list of add-on resources for your bundle. Choose Add a new job definition, enter a job title like Update Event Table.
Important: Don’t click the Add and deploy button; instead, click the down arrow and choose Only add, deploy later. If you accidentally clicked Add and deploy, it's ok, but you have deployed a job to dev that doesn’t have anything to do yet.
Let’s add some code to our project to see the CI/CD process in action. For the purposes of this blog, you can download the Hello_World.ipynb notebook from this location and then import it into your bundle folder as shown below.
Be sure to import it into the hello_world asset bundle folder (not the resources folder), as shown below.
While still on the job.yml file, uncomment lines 8-19 in the job YAML file using the Toggle line comment key combination (you can get a list of keyboard shortcuts from the context menu, as shown below). After uncommenting lines, make the two changes shown in yellow. Be aware that paths are case sensitive. You have renamed the first task in your job and pointed it to the notebook you just imported.
Open the Hello_World notebook and note a couple of things. The first cell defines a widget called “catalog” with no default. The notebook you imported has rendered the catalog widget at the top, but it is empty, so you can fill in the catalog you want to write to. For this lab, we’ll create a catalog called bu1_dev.
Enter your value for the “catalog” widget field as shown below. Notice that there is also a widget called “job_run_id” that has no default value.
Go ahead and run the notebook now by clicking the Run all button at the top. You should use Serverless compute for this article so you don't have to wait for a cluster to start.
Notice in the output and the Catalog Explorer that one row was inserted, including the time and the user who ran the code.
In the workspace, click the bundle deploy button (the “rocket ship”), wait until the Bundle resources list has finished loading, and then click the Deploy button.
Click Deploy again to confirm your updated Job definition should be deployed.
Congratulations! You should see a message that you successfully deployed to your dev target.
Click Workspace in the side navigation to see what got deployed and where.
Databricks deploys the asset bundle to your Home/.bundle folder. This is represented by the dev deployment environment highlighted in blue below. At this point, nothing is in Azure DevOps version control.
Click Jobs & Pipelines in the side navigation.
Your bundle deployment also created a job. Notice that the job title is prepended with [dev your_user_id].
Open the job and click the Tasks tab. Review the information below.
Messages (1) and (4) state that this job was created by a bundle deployment and that you shouldn’t edit this job in the UI (you can edit it, but your edits will be wiped out the next time you deploy your bundle). The name of your notebook task (2) matches what you edited back in the Update Event Table.job.yml file. The notebook path (3) you specified in the job.yml file is being used. The job is set to run as you (5). Currently, no job parameters (6) are shown – we will work on this next. There is a Disconnect from source (7) button in the UI.
Source-linked deployment is an optional feature of bundle deployment and is enabled by default for the dev deployment. This is a convenient feature because your dev job references the notebook from your Home/bu1-team1/hello_world folder in the workspace. A copy isn’t pushed into the bundle deployment folder for dev (Home/.bundle/hello_world). What’s convenient about source-linked deployment in dev is that you can make iterative fixes to your notebook and re-run your Job from the Jobs UI without having to redeploy your asset bundle.
Go back to the Runs tab and click the Run now button. Use Serverless job compute so you don’t have to wait for a job cluster to start.
The job will fail because the notebook can’t find a value for the catalog variable. When you ran the notebook manually, you typed a value into the catalog widget, but when the notebook ran from a job, the catalog widget was empty, and there was no default value specified in the first cell either.
We don’t want a default value for the catalog because we intend to run this notebook across dev, stage, and prod environments. We want to send specific catalog names to the job based on the environment it's running in. Let’s configure that next.
In this section, you’ll be working in two files, which you’ll download in a moment. But let’s first understand how they work together:
For more information on these topics, click the links behind the terms above. Visually, the variable/parameter wiring looks like this:
As you can see, the bundle configuration file (databricks.yml) contains general setup information on lines 1-12 and environment targets starting on line 14. Each target (dev, staging, and prod) defines the bundle's characteristics for that environment. This includes how to access the Databricks workspace, which variable names/values to pass, and which deployment mode to use for the environment.
We define a bundle variable catalog_var and set it to bu1_dev when deploying to the “dev” target environment. We’ll set it to bu1_stg for the staging environment, and so on. These environmental targets are defined by you. You can have as many as you like and name them however you like. However, the bundle target name must match the value specified in your Azure DevOps pipeline for the bundle validate and deploy steps, as you will see.
The deployment mode setting has two values: development and production. The dev target uses the development mode, whereas staging and prod both use the production mode. Development mode prefixes your deployed jobs with [dev <your_user_name>] so that they are easier to distinguish in the event you and your colleagues are both working on separate copies of the same bundle/pipeline in dev. Development mode also enables source-linked deployment as previously described, making it easier to iterate on your bundle wiring and notebook code. Here’s a complete reference for the behavior of the bundle mode: setting.
To save you time, here is the completed code for the example we are following. Copy these into your Event Table Job.yml and databricks.yml files.
Specify your email in the highlighted line below.
Note that the code includes a second parameter to capture the job run ID so we can write it into the event table. The syntax {{job.run_id}} captures the run ID dynamically. We’re appending that to the string “job_run_id_”, which you will see when this parameter is written into the table. You can change the parameter definition and remove the prefix if you wish. Because this parameter is defined dynamically by the job, you do not need to create a bundle variable for it as you did with catalog_var.
resources:
jobs:
Update_Event_Table:
name: Update Event Table
queue:
enabled: true
parameters:
- name: catalog
default: ${var.catalog_var}
- name: job_run_id
default: "job_run_id_{{job.run_id}}"
tasks:
- task_key: notebook_task_1
notebook_task:
notebook_path: ../Hello_World.ipynb
base_parameters:
catalog: ${var.catalog_var}
job_run_id: "job_run_id_{{job.run_id}}"
email_notifications:
on_failure:
- your_email@example.com
trigger:
periodic:
interval: 1
unit: DAYS
Edit the workspace URLs to match your environment and change the catalog names if necessary. Note that we’re using the same workspace for dev and staging in this blog, but you can use three separate workspaces for dev/staging/prod if you wish. Just make the edits below.
# Declarative Automation Bundle (DAB) configuration; more info here:
# https://docs.databricks.com/en/dev-tools/bundles/settings.html
bundle:
name: "hello_world"
include:
- resources/*.yml
- resources/*/*.yml
# Define custom variables; more info here:
# https://docs.databricks.com/en/dev-tools/bundles/reference#variables
variables:
catalog_var:
description: Custom variable containing the target catalog
default: ""
targets:
# TODO - adjust the catalog names and workspace URLs to match your deployment
# The "dev" target used by the bundle deploy feature
dev:
mode: development
default: true
variables:
catalog_var: bu1_dev
workspace:
host: https://adb-###########.#.azuredatabricks.net
# The "staging" target used by Azure DevOps
staging:
mode: production
variables:
catalog_var: bu1_stg
workspace:
host: https://adb-###########.#.azuredatabricks.net
root_path: /Workspace/devops/staging/bu1-team1/.bundle/${bundle.name}
# The "prod" target used by Azure DevOps
prod:
mode: production
variables:
catalog_var: bu1_prd
workspace:
host: https://adb-###########.#.azuredatabricks.net
root_path: /Workspace/devops/prod/bu1-team1/.bundle/${bundle.name}
Important: After updating the workspace URLs in your databricks.yml file, please click the reload button in your web browser.
This is necessary because we sourced the file with placeholder text (##########) for the workspace URL, and the bundle UI sees it as soon as you make the edit. This confuses it in the next step, as it will think there are no valid targets to deploy to. So after making your edits and reloading the workspace UI in your web browser, continue to the next step.
Click the bundle deploy button and then click the Target pulldown menu.
Notice the message saying that deployment to other workspaces is not supported. Actually, this means the bundle deploy feature is designed only to deploy locally, to the workspace you are in.
We will only use the bundle deploy button to push the bundle locally to our dev target. We will use Azure DevOps to deploy our bundles to the stage and prod environments.
Press the Escape (ESC) key to close the drop-down menu and then click the Deploy button.
Click Deploy again and review the output messages below.
After your DAB deploys to dev, go to your Jobs page and click the run button to start your job. After you start it, you can click your job and see it running.
When it finishes, click your job and notice that the catalog and job_id parameters were created and populated as expected, and your code added another row to the bu1_dev.default.test_events table.
With your code working in dev, let’s check it into Git.
Click the Workspaces (the “folder”) button and take a few minutes to familiarize yourself with the new Workspace UI. Click the “All files” button and then go back to the “Folder” button next to it. Notice that this simply toggles whether you are in the DAB bundle folder. Also, notice the open file tabs on the right. It’s nice to be able to have multiple files open at one time, even in different folders/subfolders, and switch among them without having to navigate the folder structure on the left.
When you are ready, click the Git button next to your bundle folder.
Go ahead and add some comments and click Commit & Push for your first DAB project!
In the message at the top of the screen, click the create a pull request link to create your pull request here, for convenience. This takes you right into Azure DevOps, where you can fill in the details for your Pull request!
Click Create to create your PR and then stop here.
Let’s go set up the Azure DevOps deployment pipeline. This way, when we go into DevOps to approve and merge your PR, we should see our new DevOps pipeline automatically deploy your initial hello world code to staging.
As a review, our CI/CD requirements are:
We’ll begin by setting up the staging process, which involves creating a DevOps service connection, mapping it to an Entra ID-managed Databricks service principal (SP), creating a workspace deployment folder for staging, and granting the SP permissions on the folder and the bu1_stg catalog in Unity Catalog. We’ll repeat this process for prod. Then we’ll create the DevOps variable groups, prod environment gate/approvers, and pipeline to pull it all together. After that, we can test it out!
Here’s an overview of the steps that you’ll perform twice (once for staging and once for prod), except where noted:
|
Step |
Where |
What |
Why |
|
1 |
Azure DevOps > Project Settings |
Create the bu1-team1 Service Connection |
Facilitates a secure and flexible connection between DevOps and Azure Databricks for deploying bundles to the staging and prod. For a comparison to the newer Databricks OIDC-based approach, see the topic at the end of this article. |
|
2 |
Azure Portal |
Create the bu1-team1-cicd App Registration and Federated credential |
|
|
3 |
Databricks workspace > Settings |
Create the bu1-team1-cicd Service Principal |
This will be the owner of the resources deployed by DevOps; you need to give this principal write access to the deployment folder and privileges in Unity Catalog |
|
4 |
Databricks workspace > Folders |
Create your SP and any admin group permissions on the /staging/bu1-team1 folder |
By deploying into team-based folders, you have the ability to set permissions so each team can view their own deployed code if necessary |
|
5 |
Databricks workspace > Catalog |
Give the SP and other groups appropriate permissions in Unity Catalog. |
Tables and schemas created by the SP are owned by the SP. |
|
6 |
Azure DevOps > Library |
Create a Variable group for each stage |
These hold the variable DATABRICKS_HOST with the URL to the workspace for the stage |
|
7 |
Azure DevOps > Environments |
You will create this only for “prod” |
This will be used to set up the approval gate for prod deployment |
|
8 |
Azure DevOps > Repo |
Create a root folder called “/.cicd” or similar for the pipeline code. |
This provides separation between your pipeline and bundle code. Consider requiring special approvers for code changes to files in this root folder. |
|
9 |
Azure DevOps > Pipelines |
Create a release pipeline for your hello_world project |
This will deploy your code to staging and prod after the required approvals. |
|
10 |
Test it out! |
Check the messages written into your event table in the staging and prod catalogs |
You can consider how to organize your bundles and deployment pipelines. A pipeline can deploy multiple bundles, or you can keep them 1:1 for finer control. |
Go to your DevOps project’s Project settings > Pipelines > Service connections and click Create service connection.
Choose the Azure Resource Manager option and click Next.
Important! For Identity type, you’ll see two options.
Choose the App registration or managed identity (manual) option highlighted below (do not choose the one tagged as recommended).
This lets you clearly name the Azure App Registration backing your DevOps Service Connection. This is important because the name of your Azure App Registration is synchronized with the name of your Databricks Service Principal, and this is what your data team will be searching for when granting permissions on your Unity Catalog and Workspace resources.
If you choose the other option, the App Registration name will be automatically generated by Azure as <devops_org>-<project>-<GUID>, with no indication of whether it is for staging or prod. You cannot rename it, and this will make it hard to assign permissions later. The manual approach we recommend in this article involves a few more steps, but it is your best option as of April 2026.
Note: Both approaches involve creating an App registration in your Azure portal, so if you don’t have the permissions to do this, you’ll get an error and will need to work on these steps with someone who does.
After choosing App registration or managed identity (manual), fill in the Service Connection Name (e.g., bu1-team1-staging-svc-conn), description, and your Directory (tenant) ID under the Part 1: Basics section, and then click Next.
This takes you to Part 2: App registration details.
Tip: to find your tenant ID, click your user ID circle in the DevOps portal banner bar and then click the Switch directory link. The tenant ID will be the string under the current directory.
At this point, you’ll need to keep the Part 2 screen open and open the Azure Portal in another tab. You can follow the steps below and see them illustrated in the image below.
Don’t click Verify and save yet! You need to give your App Registration permission to read your Subscription. If you clicked it anyway, look carefully at the error message. You’ll see that your Part 1 work was saved as a draft. You can come back and complete Part 2 after you do the required steps.
Go to the Azure Portal and find your Subscription. Go to Access control (IAM) and click Add role assignment. Click Reader under Name and then click Next. Leave the Assign access to option set to User, group, or service principal and click the Select members link. Find your service principal (note: your App Registration previously created a service principal in the background with the same name, e.g., bu1-team1-cicd-staging-app), pick it, and click Select.
Click Review + Assign.
Now return to Azure DevOps and your Edit service connection Part 2 screen and click the Verify and save button.
Congratulations! That’s a lot of steps, but it's necessary because you need to establish a trust relationship between your Azure DevOps project and your Azure Portal App Registration/Service Principal; these services do not all trust each other by default.
Now we are going to configure trust between our Azure App Registration/Service Principal and the Databricks Service Principal that will run the resources Azure DevOps deployed in your workspace.
Go to your staging workspace (in this article, we use the bu1-dev workspace). Go to Workspace Settings > Identity and access > Service principals and click Manage. Click Add service principal and then click the Add new button.
Under Management, choose Microsoft Entra ID managed, and then copy the Application ID from your Azure App Registration over.
Note: as of April 2026, the UI shown below for the Microsoft Entra ID managed option shows a field for entering the service principal name. While this field allows you to enter and save a value, it will not be retained, as you will see when you go to search for this service principal in the permissions UI throughout Databricks.
Give it any name like bu1-team1-cicd-staging-sp (but you’ll see it will be changed back to the name of your App Registration shortly) and click Add.
Note: It’s possible to use a single Azure DevOps Service Connection, Azure App Registration, and Databricks Service Principal for all of your CI/CD processes (including both staging and production deployments). However, you’ll notice we’ve been including the term “staging” in the names of these CI/CD resources we’ve been creating, so you can later add the “prod” versions and have more precise governance over your CI/CD process.
Click your Databricks SP after it's created. It will need the Workspace Access and SQL access entitlements, depending on your code. I recommend using Serverless jobs, especially during development, to speed up your trial-and-error CI/CD testing. If you use Serverless jobs, you do not need to assign the Allow unrestricted cluster creation entitlement.
Explore the Permissions tab. This lets you decide who can manage and use the SP, not what this SP can do in your workspace. You can consider whether a team in the staging environment needs to manually re-run a job for some reason. In this case, give that team the USE permission.
Explore the Git integration tab. This feature allows your SP to connect with a Git repo, which is needed when the job is configured to pull notebook code from Git at runtime. But we are not using that approach in this blog. We are configuring DevOps to push the notebook code and bundle resources to each environment. Leave Git integration unconfigured for the SP.
In the workspace, create a staging location where the Azure DevOps pipeline will deploy your code. Then right-click this folder and choose the Share (Permissions) menu. Search for your SP and grant it Can Manage permissions, as shown below (remember, as mentioned previously, you are searching for the App Registration you created in the Azure portal, which is synced to the name of the Databricks service principal you just created).
If you would like your development team to see what is deployed here, grant them View permissions. Generally, developers, QA teams, and admins should not have write access to the staging bundle target deployment folder.
There are a few strategies for creating and managing permissions on catalogs and schemas. In this blog, we create the catalogs manually and assign Data Editor permissions to our SP. We also grant the Manage permission so our SP can drop tables and manage permissions if we were to do that in our job code. Feel free to reconsider and evolve your choices as necessary.
Give your SP permission to create data assets in your stage catalog as shown below.
Go into the default schema in your staging catalog and notice that many of these permissions are inherited as expected. Look at the permissions on your bu1_stg.default schema. Notice that your SP can read and write to this schema even though it is not the owner (you are the owner since it was created when you made the catalog).
We want our CI/CD pipeline to have two stages:
We’ll use DevOps variable groups to store the workspace URLs and a DevOps environment to define the approval process that gates our prod deployment.
Go to your DevOps project > Pipelines > Library and click the + Variable group button. Name it stagingVariables (you’ll reference this in the first stage of your pipeline) and add a variable for DATABRICKS_HOST along with a URL pointing to the hostname of your staging workspace (bu1-dev in the example for this article). Then click Save.
Repeat the process for another variable group called prodVariables, which you will reference in the second stage of your pipeline.
Go to your DevOps project > Environments and click the Create environment button. Name it prod (you’ll reference this in the second stage of your pipeline), leave Resource set to None, and click Create.
Next, click the Approvals and checks tab and click the Approvals check. For this article, we’ll just pick two approvers, but you can configure this as required by your organization.
You do not need to create a staging environment, as we plan to deploy automatically after merging to main with no conditions or approvals.
Go to your DevOps project > Repos > Files page and create a new folder (e.g., .cicd) and your Hello World pipeline inside it.
Copy the following YAML into it and edit the azureSubscription value on lines 33 and 102 to match the name of your DevOps Service Connections (yes; its confusing that it needs the service connection name, but it asks for the subscription name). Review the comments and make any other necessary adjustments.
# This triggers the pipeline on a commit to the 'main' branch
trigger:
branches:
include:
- main
# Prevents the pipeline from triggering on Pull Requests; only on merges
pr: none
stages:
# -----------------------------------------------
# Stage 1: Deploy to Staging (automatic on merge)
# -----------------------------------------------
# TODO - In the bundle validate/deploy steps below, change "workingDirectory"
# to the folder containing your databricks.yml and edit the target values if
# your choice differs from "staging" used in the example below.
- stage: DeployStaging
displayName: 'Deploy to Staging'
variables:
- group: stagingVariables
condition: |
eq(variables['Build.SourceBranch'], 'refs/heads/main')
jobs:
- job: deployStagingJob
pool:
vmImage: ubuntu-latest
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: 3.11
- task: AzureCLI@2
displayName: 'Get Databricks Token (Staging)'
inputs:
# TODO - Change azureSubscription value to the name of your DevOps "staging" Service Connection
azureSubscription: bu1-team1-staging-svc-conn
scriptType: 'bash'
scriptLocation: 'inlineScript'
failOnStandardError: true
# Set the token as a pipeline variable for subsequent steps; note that the "echo"
# line does not print the token to the pipeline logs. The resource ID is the well-known
# Azure AD application ID for Databricks (same for all tenants).
inlineScript: |
set -e
DATABRICKS_TOKEN=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d --query "accessToken" -o tsv)
echo "##vso[task.setvariable variable=DATABRICKS_TOKEN]$DATABRICKS_TOKEN"
# Clone the repository so the bundle files are available to the pipeline agent
- checkout: self
displayName: 'Checkout repository'
# Install Python dependencies here if your bundle requires them
# For example:
# - script: pip install -r requirements.txt
# displayName: 'Install Python dependencies'
- script: |
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
databricks --version
displayName: 'Install Databricks CLI'
- script: databricks bundle validate -t staging
displayName: 'Validate bundle for the "staging" environment'
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
workingDirectory: $(Build.SourcesDirectory)/hello_world
- script: databricks bundle deploy -t staging
displayName: 'Deploy bundle to the "staging" environment'
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
workingDirectory: $(Build.SourcesDirectory)/hello_world
# -----------------------------------------------
# Stage 2: Deploy to Production (requires approval in Azure DevOps)
# -----------------------------------------------
# TODO - In the bundle validate/deploy steps below, change "workingDirectory"
# to the folder containing your databricks.yml and edit the target values if
# your choice differs from "prod" used in the example below.
- stage: DeployProd
displayName: 'Deploy to Prod'
dependsOn: DeployStaging
condition: succeeded()
variables:
- group: prodVariables
jobs:
# A "deployment:" job (rather than a regular "job:") is required to
# connect to an Azure DevOps Environment and enable the approval gate.
- deployment: deployProdJob
displayName: 'Deploy to Prod'
pool:
vmImage: ubuntu-latest
# Setup the approval gate. The value for "environment" must match
# the Azure DevOps Environment name where approvals are configured.
environment: 'prod'
strategy:
runOnce:
deploy:
steps:
- checkout: self
- task: UsePythonVersion@0
displayName: 'Use Python 3.11'
inputs:
versionSpec: 3.11
- task: AzureCLI@2
displayName: 'Get Databricks Token (Prod)'
inputs:
# TODO - Change azureSubscription value to the name of your DevOps "prod" Service Connection
azureSubscription: bu1-team1-prod-svc-conn
scriptType: 'bash'
scriptLocation: 'inlineScript'
failOnStandardError: true
# Set the token as a pipeline variable for subsequent steps; note that the "echo"
# line does not print the token to the pipeline logs. The resource ID is the well-known
# Azure AD application ID for Databricks (same for all tenants).
inlineScript: |
set -e
DATABRICKS_TOKEN=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d --query "accessToken" -o tsv)
echo "##vso[task.setvariable variable=DATABRICKS_TOKEN]$DATABRICKS_TOKEN"
# Install Python dependencies here if your bundle requires them
# For example:
# - script: pip install -r requirements.txt
# displayName: 'Install Python dependencies'
- script: |
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
databricks --version
displayName: 'Install Databricks CLI'
- script: databricks bundle validate -t prod
displayName: 'Validate bundle for the "prod" environment'
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
workingDirectory: $(Build.SourcesDirectory)/hello_world
- script: databricks bundle deploy -t prod
displayName: 'Deploy bundle to the "prod" environment'
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
workingDirectory: $(Build.SourcesDirectory)/hello_world
Click Commit to save it to the main branch of your repo.
Before making the DevOps pipeline and pointing it to this file, let’s approve and merge the initial Hello World DABs pull request so there is some code to deploy.
In your Azure DevOps project, click Repos > Pull requests and select the pull request you created earlier from the Databricks Git integration UI.
Click Complete and then choose your merge options before clicking the Complete merge button.
You can verify that your bundle code is now in the main branch.
Now let’s create the pipeline. Go to your project > Pipelines > Pipelines menu and click Create Pipeline. Choose Azure Repos Git.
Choose your team repository and then click Existing Azure Pipelines YAML file.
Enter the path to your pipeline file and click Continue.
Before you can run the pipeline, you need to either:
Once you have decided, click Run.
You do not need to set or change anything on the next screen that comes up.
Your Deploy to Staging job will run. The first time it runs, you will see a message asking for permission to access the staging variable group. Give it permission.
When the staging deployment finishes, go find your Job and run it.
See that the data was added by the service principal. Depending on the part of the UI, you will either see the App Registration’s App ID or Name as shown below.
While the code deployment, job execution, and data writing to the table were all performed by the Azure App Registration’s SP, it was the Workspace Admin who enabled it in Workspace Settings, and the Unity Catalog owner who granted it editor rights on the schema.
When you are ready, click the Review button and approve running the prod deployment stage.
Your bundle should be deployed to the prod workspace. Find the job and run it, then check the prod catalog to see that it ran.
Congratulations! 🎉. If you made it this far, you are now an Azure Databricks CI/CD champion. 😁
Please share your thoughts about this article in the comments section. What concepts did you learn from this blog that were helpful? Is there anything you don’t understand or are unsure of? What aspects of the example shown do you think should be expanded on for an intro CI/CD best practice article? What other topics would you like to be expanded on in another Technical Blog?
The approach taken for integrating Azure DevOps and Databricks in this blog uses App registration-backed Azure DevOps service connections. This is a well-established pattern for securely federating your pipeline workload between the two cloud services. But Databricks recently introduced an alternative OIDC-based approach. How do these compare, and which is the best option?
Both approaches use secure, token-based protocols (OIDC and OAuth). The difference is the issuer. In the blog, I use the DevOps service connection feature, which means the DevOps pipeline (via the AzureCLI@2 task) obtains an Azure AD access token and passes it to the Databricks CLI as DATABRICKS_TOKEN. This works well when your pipeline also needs access to other Azure resources (Storage, Key Vault, etc.) through the same service principal.
Databricks also supports a newer alternative: OAuth token federation (also called workload identity federation or OIDC). With this approach, the Azure DevOps pipeline authenticates directly to Databricks — no Azure AD intermediary, no service connections, and no explicit token retrieval step. You create a federation policy on the Databricks service principal that trusts your Azure DevOps organization's OIDC issuer, and the Databricks CLI handles the token exchange automatically using the pipeline's built-in System.AccessToken.
This does result in a simpler pipeline (no AzureCLI@2 task, no Azure service connections), but it only authenticates to Databricks — if you need access to other Azure resources, you'll still need the other service connection approach shown in this blog. To learn more, see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/provider-azure-devops in the Databricks documentation.
|
Aspect |
Approach 1: |
Approach 2: |
|
Doc link |
Connect to Azure using App registration with workload identity federation |
Enable workload identity federation for Azure DevOps Pipelines |
|
Maturity |
Well-established pattern |
Newer (Databricks OAuth federation) |
|
Protocols used |
OIDC, OAuth 2.0, bearer token |
OIDC → Databricks OAuth 2.0 |
|
Auth intermediary |
Azure AD (token broker) |
None (direct OIDC) |
|
Token acquisition |
Explicit az account get-access-token |
Automatic by Databricks SDK/CLI |
|
Azure CLI in pipeline |
Required |
Not needed |
|
Azure service connections |
Required (one per env) |
Not needed |
|
Federation policy |
On Azure AD (trusts Azure DevOps) |
On Databricks SP (trusts Azure DevOps) |
|
Complexity |
Two hops; 8 pipeline steps per stage |
One hop; 2-3 pipeline steps per stage |
|
Azure resource access |
The same SP can access other Azure resources (key vault, etc.). |
Only authenticates to Databricks |
|
Better when |
|
|
|
What’s different |
-- |
Skips Azure AD entirely. A federation policy is created directly on the Databricks service principal that trusts the Azure DevOps OIDC issuer (https://vstoken.dev.azure.com/<org_id>). The pipeline YAML is Minimal: |
|
Requires |
-- |
|
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.