cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
jim_thorstad
Databricks Employee
Databricks Employee

Establishing a trusted Continuous Integration/Continuous Deployment (CI/CD) process is crucial for effectively managing the lifecycle of your data and AI workloads in Azure Databricks. However, with numerous setup options and a constantly evolving platform, the big question is, "Where do I even start?" This comprehensive blog cuts through the complexity to outline a recommended CI/CD approach for Azure Databricks. 

jim_thorstad_0-1776186435312.png

We focus on the latest best practices that support strong governance with minimal administrative overhead, even when supporting multiple teams in a shared infrastructure. It's tailored for developers who enjoy working in the Databricks workspace UI, though it also fully supports IDEs like Visual Studio Code.

The recommended approach uses Declarative Automation Bundles (DABs; formerly known as Databricks Asset Bundles) for deployment. Bundles seamlessly integrate with Databricks Git Folders for version control, Azure DevOps for automation, and Azure DevOps service connections for Databricks connectivity. At the end of the blog, I compare the DevOps service connection and Databricks OIDC approaches for integrating DevOps and Databricks for your reference.

Finally, we’ll follow a simple but effective trunk-based branching strategy in DevOps.  Staging will be deployed automatically after feature branch pull requests are approved and merged into main. Production deployment from main will be contingent on a successful staging deployment and will require an additional approval step, which we will implement.

jim_thorstad_1-1776186629097.png

This blog covers a wide range of topics, but by guiding you toward the optimal choices, it will save you significant time and effort in the long run. The content is divided into three parts:

  1. Aligning your infrastructure and governance requirements
  2. Setting up a versioned bundle development process
  3. Setting up an Azure DevOps pipeline for deployment to staging and production

Part 1: Align Your Infrastructure and Governance Requirements

Every organization needs to decide what level of separation and organization is required for Databricks workspaces, deployment environments, Unity Catalog, DevOps projects, and Git repositories. There is no one right answer for every organization, and we can only touch on the considerations before moving on to the article's topic. 

Topic

Consideration

For This Article

Environments

Three is common, but some may use two, and others four or more.

We’ll use: dev, staging, and prod

Databricks Workspaces

Generally, workspaces align with environments. Larger organizations may also organize business units in separate workspaces.


An advantage of three workspaces is that the workspace admin can freely enable preview features and other settings in the “dev” workspace and be certain not to affect the staging and prod workloads.

We’ll show how multiple independent teams can use two workspaces (bu1-dev and bu1-prod) with reasonable governance. Staging will deploy to a secure folder in the dev workspace.


You can follow along and use 3 workspaces with dev, staging, and prod, each in its own workspace.

Catalogs

There are many strategies for organizing your data in Unity Catalog.

We’ll name our catalogs bu1_dev, bu1_stg, and bu1_prd, demonstrating how independent business units can operate within a single Unity Catalog metastore.

Azure DevOps Projects

Smaller organizations use a single project to simplify setup, while larger organizations may require multiple projects to have hard boundaries between code repositories and approvers.

We’ll use a project called bu1-team1, allowing for independent teams (e.g., HR, Legal, Finance) within a “corporate” BU to operate securely within the shared workspaces alongside other BU teams.

Git Repository

There are pros and cons for a single “mono repo” versus multiple team or domain-based repos.

Since we have already selected a team-based DevOps project, we’ll use its single default repo.

Note: I encourage you to follow the example and organizational strategy outlined in this blog to see how the whole process works together effectively. Then you can consider what you learned, how your organization’s governance and organization needs may differ, and make the necessary changes. Having made these choices, let’s look over our organizational strategy visually. 

jim_thorstad_2-1776186912117.png

Within our “corporate” organization (BU1), Team 2 handles HR data, and everything else is handled by Team 1. These two teams will work independently in our shared bu1-dev and bu1-prod workspaces. For this to work well, it is important to note that one or more individuals must be trusted by both teams to serve as workspace Administrators. If this can be arranged, infrastructure management is simplified; if not, teams 1 and 2 should be separated into their own dev/prod workspaces.

Down the road, another division (BU2) could be acquired that shares the same Entra ID directory but elects to have separate workspaces and DevOps infrastructure. That’s okay, and it's the reason we’ll prefix our catalog names in Unity Catalog (e.g., bu1_dev, bu2_dev, …).

We’ll align with the “mono repo” approach within each DevOps project and further organize our code using Declarative Automation Bundles. In this blog, we’ll use a single “Hello World” bundle to illustrate how the recommended development and CI/CD processes work. In practice, you will have many bundles for a given data domain (e.g., finance), and these can all be versioned in the same repo. We’ll use a single DevOps pipeline for staging and production deployments, but you may choose a different pipeline strategy.

jim_thorstad_3-1776186945297.png

Part 2 - Setting Up a Versioned Bundle Development Process

Databricks bundles and Git folders work seamlessly together with Azure DevOps, and set you up for success in your CI/CD process. Let’s jump into it.

Initial Azure DevOps Set Up

This is pretty straightforward and involves creating the team’s Azure DevOps project and initializing the repo. You do not need to create any branches–we will use the built-in main branch and create our feature branches within Databricks using the built-in Git integration UI. 

Create your project, initialize the main branch, and then go to the repo and copy the repository URL to your clipboard for the next step.

jim_thorstad_4-1776186982114.png

Create a Linked Account to Access the Repo

In the Databricks workspace, each developer creates their own Azure DevOps (MSFT Entra) linked account credential, allowing them to authenticate to the Git repository. Do this now in your “dev” workspace on the Settings > User > Linked accounts page. You do not need to do this in your prod workspace because there is no Git integration required there (DevOps will push the code to prod).

jim_thorstad_5-1776187016894.png

Create a Git Folder that Points to the Repo

In their Home directory, each developer creates their own Git folder that points to the team’s repo. This step is done by each developer only once. Click Create > Git folder.

jim_thorstad_6-1776187041430.png

Paste in the repo URL from your Azure DevOps project (should still be on your clipboard), then click Create Git folder.

jim_thorstad_7-1776187095926.png

You will notice that Git folders have special properties in the Workspace UI, including these two buttons.

jim_thorstad_8-1776187129932.png

Click Open in editor and notice that a new browser tab opens with your Git folder in a new “focused” authoring mode. We’ll discuss this in a moment. 

jim_thorstad_9-1776187180928.png

We won’t use this view right now, so close the browser tab that was just opened.

Now, click the Git branch button as shown below.

jim_thorstad_10-1776187214034.png

In the Git integration UI, click Create Branch, assign a name to your feature branch (e.g., features/initial_work_on_hello_world), and then click Create.

jim_thorstad_11-1776187253227.png

Click the “X” button in the top right corner of the Git integration dialog to close it. Using your Microsoft ENTRA credentials, the Databricks Git integration created the branch in your Azure DevOps Repo and switched you to it in the Workspace UI, as shown below.

jim_thorstad_12-1776187294011.png

Navigating the Workspace UI

As of April, 2026, there are two UI previews that affect the behavior of the menus shown in this article. Have your workspace Admin go to Previews and search for “tree”. By default, the new “authoring” context preview (the first one, below) is enabled, and the new tree view for the file browser is not. We recommend enabling both so you can follow along with the article and because they both make working with Git folders and asset bundles easier. Here are the documentation links (as of April, 2026) for reference:

jim_thorstad_13-1776187328636.png

Create and Deploy Your First Bundle to “Dev”

From your Git folder, click Create and then choose Bundle.

jim_thorstad_14-1776187353760.png

There are different approaches to organizing your development copy of the bundles in your Git folder. A flat list of bundles, as shown below, works fine. You can also create folders under the Git folder and organize your bundles within them. 

jim_thorstad_15-1776187376364.png

For the purposes of this blog, enter a name like hello_world for the Bundle name and choose the Empty project option. After clicking Create, you will see a list of add-on resources for your bundle. Choose Add a new job definition, enter a job title like Update Event Table. 

Important: Don’t click the Add and deploy button; instead, click the down arrow and choose Only add, deploy later. If you accidentally clicked Add and deploy, it's ok, but you have deployed a job to dev that doesn’t have anything to do yet.

jim_thorstad_16-1776187458541.png

Add Some Code and Configure Your Job

Let’s add some code to our project to see the CI/CD process in action. For the purposes of this blog, you can download the Hello_World.ipynb notebook from this location and then import it into your bundle folder as shown below.

Be sure to import it into the hello_world asset bundle folder (not the resources folder), as shown below.

jim_thorstad_17-1776187488097.png

While still on the job.yml file, uncomment lines 8-19 in the job YAML file using the Toggle line comment key combination (you can get a list of keyboard shortcuts from the context menu, as shown below).  After uncommenting lines, make the two changes shown in yellow. Be aware that paths are case sensitive. You have renamed the first task in your job and pointed it to the notebook you just imported.

jim_thorstad_18-1776187507876.png

Open the Hello_World notebook and note a couple of things. The first cell defines a widget called “catalog” with no default. The notebook you imported has rendered the catalog widget at the top, but it is empty, so you can fill in the catalog you want to write to. For this lab, we’ll create a catalog called bu1_dev.

Enter your value for the “catalog” widget field as shown below. Notice that there is also a widget called “job_run_id” that has no default value.

jim_thorstad_19-1776187530474.png

Go ahead and run the notebook now by clicking the Run all button at the top. You should use Serverless compute for this article so you don't have to wait for a cluster to start. 

jim_thorstad_20-1776187551646.png

Notice in the output and the Catalog Explorer that one row was inserted, including the time and the user who ran the code.

jim_thorstad_21-1776187579945.png

In the workspace, click the bundle deploy button (the “rocket ship”), wait until the Bundle resources list has finished loading, and then click the Deploy button.

jim_thorstad_22-1776187600169.png

Click Deploy again to confirm your updated Job definition should be deployed.

jim_thorstad_23-1776187631019.png

Congratulations!  You should see a message that you successfully deployed to your dev target. 

Click Workspace in the side navigation to see what got deployed and where. 

jim_thorstad_24-1776187657685.png

Databricks deploys the asset bundle to your Home/.bundle folder. This is represented by the dev deployment environment highlighted in blue below. At this point, nothing is in Azure DevOps version control.

jim_thorstad_25-1776187687791.png

Click Jobs & Pipelines in the side navigation.

Your bundle deployment also created a job. Notice that the job title is prepended with [dev your_user_id]

Open the job and click the Tasks tab. Review the information below. 

jim_thorstad_26-1776187714667.png

Messages (1) and (4) state that this job was created by a bundle deployment and that you shouldn’t edit this job in the UI (you can edit it, but your edits will be wiped out the next time you deploy your bundle). The name of your notebook task (2) matches what you edited back in the Update Event Table.job.yml file. The notebook path (3) you specified in the job.yml file is being used. The job is set to run as you (5). Currently, no job parameters (6) are shown – we will work on this next. There is a Disconnect from source (7) button in the UI.

Source-linked deployment is an optional feature of bundle deployment and is enabled by default for the dev deployment. This is a convenient feature because your dev job references the notebook from your Home/bu1-team1/hello_world folder in the workspace. A copy isn’t pushed into the bundle deployment folder for dev (Home/.bundle/hello_world). What’s convenient about source-linked deployment in dev is that you can make iterative fixes to your notebook and re-run your Job from the Jobs UI without having to redeploy your asset bundle.

Go back to the Runs tab and click the Run now button. Use Serverless job compute so you don’t have to wait for a job cluster to start. 

jim_thorstad_27-1776187740391.png

The job will fail because the notebook can’t find a value for the catalog variable. When you ran the notebook manually, you typed a value into the catalog widget, but when the notebook ran from a job, the catalog widget was empty, and there was no default value specified in the first cell either.

We don’t want a default value for the catalog because we intend to run this notebook across dev, stage, and prod environments. We want to send specific catalog names to the job based on the environment it's running in. Let’s configure that next.

Parameterizing the Catalog Name Passed to Your Notebook

In this section, you’ll be working in two files, which you’ll download in a moment. But let’s first understand how they work together:

  1. Bundle configuration (databricks.yml) - You’ll define a custom variable catalog_var and some environment-specific values for it (bu1_dev, bu1_stg, and bu1_prd). This will also involve defining our deployment environment targets (dev, stage, and prod).
  2. Job configuration (job.yml) - You’ll define a job parameter catalog to accept the value from the bundle variable during deployment, and we’ll configure a job task base_parameter so the job parameter is available to our notebook. 

For more information on these topics, click the links behind the terms above. Visually, the variable/parameter wiring looks like this:

jim_thorstad_28-1776187794608.png

As you can see, the bundle configuration file (databricks.yml) contains general setup information on lines 1-12 and environment targets starting on line 14. Each target (dev, staging, and prod) defines the bundle's characteristics for that environment. This includes how to access the Databricks workspace, which variable names/values to pass, and which deployment mode to use for the environment.

We define a bundle variable catalog_var and set it to bu1_dev when deploying to the “dev” target environment. We’ll set it to bu1_stg for the staging environment, and so on. These environmental targets are defined by you. You can have as many as you like and name them however you like. However, the bundle target name must match the value specified in your Azure DevOps pipeline for the bundle validate and deploy steps, as you will see.

The deployment mode setting has two values: development and production. The dev target uses the development mode, whereas staging and prod both use the production mode. Development mode prefixes your deployed jobs with [dev <your_user_name>] so that they are easier to distinguish in the event you and your colleagues are both working on separate copies of the same bundle/pipeline in dev. Development mode also enables source-linked deployment as previously described, making it easier to iterate on your bundle wiring and notebook code. Here’s a complete reference for the behavior of the bundle mode: setting.

jim_thorstad_29-1776187864368.png

To save you time, here is the completed code for the example we are following. Copy these into your Event Table Job.yml and databricks.yml files.

Home/bu1-team1/hello_world/resources/Update Event Table.job.yml

Specify your email in the highlighted line below. 

Note that the code includes a second parameter to capture the job run ID so we can write it into the event table. The syntax {{job.run_id}} captures the run ID dynamically. We’re appending that to the string “job_run_id_”, which you will see when this parameter is written into the table. You can change the parameter definition and remove the prefix if you wish. Because this parameter is defined dynamically by the job, you do not need to create a bundle variable for it as you did with catalog_var.

 

resources:
  jobs:
    Update_Event_Table:
      name: Update Event Table
      queue:
        enabled: true
      parameters:
        - name: catalog
          default: ${var.catalog_var}
        - name: job_run_id
          default: "job_run_id_{{job.run_id}}"
      tasks:
        - task_key: notebook_task_1
          notebook_task:
            notebook_path: ../Hello_World.ipynb
            base_parameters:
              catalog: ${var.catalog_var}
              job_run_id: "job_run_id_{{job.run_id}}"
      email_notifications:
        on_failure:
          - your_email@example.com
      trigger:
        periodic:
          interval: 1
          unit: DAYS

Home/bu1-team1/hello_world/databricks.yml

Edit the workspace URLs to match your environment and change the catalog names if necessary. Note that we’re using the same workspace for dev and staging in this blog, but you can use three separate workspaces for dev/staging/prod if you wish. Just make the edits below.

 

# Declarative Automation Bundle (DAB) configuration; more info here: 
# https://docs.databricks.com/en/dev-tools/bundles/settings.html
bundle:
  name: "hello_world"

include:
  - resources/*.yml
  - resources/*/*.yml
# Define custom variables; more info here: 
# https://docs.databricks.com/en/dev-tools/bundles/reference#variables
variables:
  catalog_var:
    description: Custom variable containing the target catalog
    default: ""

targets:
  # TODO - adjust the catalog names and workspace URLs to match your deployment
  # The "dev" target used by the bundle deploy feature
  dev:
    mode: development
    default: true
    variables:
      catalog_var: bu1_dev
    workspace:
      host: https://adb-###########.#.azuredatabricks.net

  # The "staging" target used by Azure DevOps
  staging:
    mode: production
    variables:
      catalog_var: bu1_stg
    workspace:
      host: https://adb-###########.#.azuredatabricks.net
      root_path: /Workspace/devops/staging/bu1-team1/.bundle/${bundle.name}

  # The "prod" target used by Azure DevOps
  prod:
    mode: production
    variables:
      catalog_var: bu1_prd
    workspace:
      host: https://adb-###########.#.azuredatabricks.net
      root_path: /Workspace/devops/prod/bu1-team1/.bundle/${bundle.name}

Important: After updating the workspace URLs in your databricks.yml file, please click the reload button in your web browser. 

This is necessary because we sourced the file with placeholder text (##########) for the workspace URL, and the bundle UI sees it as soon as you make the edit. This confuses it in the next step, as it will think there are no valid targets to deploy to. So after making your edits and reloading the workspace UI in your web browser, continue to the next step.

Click the bundle deploy button and then click the Target pulldown menu. 

jim_thorstad_0-1776191070143.png

Notice the message saying that deployment to other workspaces is not supported. Actually, this means the bundle deploy feature is designed only to deploy locally, to the workspace you are in. 

We will only use the bundle deploy button to push the bundle locally to our dev target. We will use Azure DevOps to deploy our bundles to the stage and prod environments.

Press the Escape (ESC) key to close the drop-down menu and then click the Deploy button.

jim_thorstad_1-1776191123230.png

Click Deploy again and review the output messages below.

jim_thorstad_2-1776191146067.png

After your DAB deploys to dev, go to your Jobs page and click the run button to start your job. After you start it, you can click your job and see it running.

jim_thorstad_3-1776191176189.png

When it finishes, click your job and notice that the catalog and job_id parameters were created and populated as expected, and your code added another row to the bu1_dev.default.test_events table.

jim_thorstad_4-1776191229954.png

With your code working in dev, let’s check it into Git. 

Click the Workspaces (the “folder”) button and take a few minutes to familiarize yourself with the new Workspace UI. Click the “All files” button and then go back to the “Folder” button next to it. Notice that this simply toggles whether you are in the DAB bundle folder. Also, notice the open file tabs on the right. It’s nice to be able to have multiple files open at one time, even in different folders/subfolders, and switch among them without having to navigate the folder structure on the left.

jim_thorstad_5-1776191258090.png

When you are ready, click the Git button next to your bundle folder.

jim_thorstad_6-1776191292941.png

Go ahead and add some comments and click Commit & Push for your first DAB project!

jim_thorstad_7-1776191317552.png

In the message at the top of the screen, click the create a pull request link to create your pull request here, for convenience. This takes you right into Azure DevOps, where you can fill in the details for your Pull request!

jim_thorstad_8-1776191354102.png

Click Create to create your PR and then stop here.

Let’s go set up the Azure DevOps deployment pipeline. This way, when we go into DevOps to approve and merge your PR, we should see our new DevOps pipeline automatically deploy your initial hello world code to staging.

Part 3 - Creating the Azure DevOps Deployment Pipeline for CI/CD

As a review, our CI/CD requirements are:

  1. Approved developer PRs should be merged into main and automatically deployed to staging. Here, the merged code can be tested in the broader context of the data platform, potentially alongside other approved changes awaiting promotion to prod. 
  2. Then, after further approval, the main branch is deployed to prod. 

We’ll begin by setting up the staging process, which involves creating a DevOps service connection, mapping it to an Entra ID-managed Databricks service principal (SP), creating a workspace deployment folder for staging, and granting the SP permissions on the folder and the bu1_stg catalog in Unity Catalog. We’ll repeat this process for prod. Then we’ll create the DevOps variable groups, prod environment gate/approvers, and pipeline to pull it all together. After that, we can test it out!

Here’s an overview of the steps that you’ll perform twice (once for staging and once for prod), except where noted:

Step

Where

What

Why

1

Azure DevOps > Project Settings

Create the bu1-team1 Service Connection

Facilitates a secure and flexible connection between DevOps and Azure Databricks for deploying bundles to the staging and prod. 


For a comparison to the newer Databricks OIDC-based approach, see the topic at the end of this article.

2

Azure Portal

Create the bu1-team1-cicd App Registration and Federated credential

3

Databricks workspace > Settings

Create the bu1-team1-cicd Service Principal 

This will be the owner of the resources deployed by DevOps; you need to give this principal write access to the deployment folder and privileges in Unity Catalog

4

Databricks workspace > Folders

Create your SP and any admin group permissions on the /staging/bu1-team1 folder

By deploying into team-based folders, you have the ability to set permissions so each team can view their own deployed code if necessary

5

Databricks workspace > Catalog

Give the SP and other groups appropriate permissions in Unity Catalog. 

Tables and schemas created by the SP are owned by the SP.

6

Azure DevOps > Library 

Create a Variable group for each stage

These hold the variable DATABRICKS_HOST with the URL to the workspace for the stage

7

Azure DevOps > Environments 

You will create this only for “prod”

This will be used to set up the approval gate for prod deployment

8

Azure DevOps > Repo

Create a root folder called “/.cicd” or similar for the pipeline code.

This provides separation between your pipeline and bundle code. Consider requiring special approvers for code changes to files in this root folder.

9

Azure DevOps > Pipelines

Create a release pipeline for your hello_world project

This will deploy your code to staging and prod after the required approvals. 

10

Test it out!

Check the messages written into your event table in the staging and prod catalogs

You can consider how to organize your bundles and deployment pipelines. A pipeline can deploy multiple bundles, or you can keep them 1:1 for finer control.

 

Create the Staging Service Connection and App Registration

Go to your DevOps project’s Project settings > Pipelines > Service connections and click Create service connection

Choose the Azure Resource Manager option and click Next.

jim_thorstad_9-1776191487263.png

Important!  For Identity type, you’ll see two options. 

Choose the App registration or managed identity (manual) option highlighted below (do not choose the one tagged as recommended). 

jim_thorstad_10-1776191527970.png

This lets you clearly name the Azure App Registration backing your DevOps Service Connection. This is important because the name of your Azure App Registration is synchronized with the name of your Databricks Service Principal, and this is what your data team will be searching for when granting permissions on your Unity Catalog and Workspace resources. 

If you choose the other option, the App Registration name will be automatically generated by Azure as <devops_org>-<project>-<GUID>, with no indication of whether it is for staging or prod. You cannot rename it, and this will make it hard to assign permissions later. The manual approach we recommend in this article involves a few more steps, but it is your best option as of April 2026.

Note: Both approaches involve creating an App registration in your Azure portal, so if you don’t have the permissions to do this, you’ll get an error and will need to work on these steps with someone who does. 

After choosing App registration or managed identity (manual), fill in the Service Connection Name (e.g., bu1-team1-staging-svc-conn), description, and your Directory (tenant) ID under the Part 1: Basics section, and then click Next

This takes you to Part 2: App registration details.

Tip: to find your tenant ID, click your user ID circle in the DevOps portal banner bar and then click the Switch directory link. The tenant ID will be the string under the current directory.

jim_thorstad_11-1776191563270.png

At this point, you’ll need to keep the Part 2 screen open and open the Azure Portal in another tab. You can follow the steps below and see them illustrated in the image below.

  1. Create the App Registration (e.g., bu1-team1-cicd-staging-app) in the Azure portal.
  2. Copy the Application ID (formerly called client ID) and add it to your DevOps service connection Part 2 field labeled Application (client) ID.
  3. While you are there, fill in your Subscription ID. You can get the Subscription ID by searching Subscriptions in the Azure Portal.
  4. Enter your Subscription name - do not click the Verify and save button yet.
  5. Back in the Azure Portal on your App Registration page, click Manage on the left.
  6. Then click Certificates & secrets.
  7. Click the Federated credentials tab.
  8. Click Add Credential.
  9. Choose Other issuer.
  10. Copy the Issuer URL from your service connection under Part 2 and paste it in the Issuer field.
  11. Copy the Subject identifier and paste it in the Value field.
  12. Enter a name for your credential (e.g. bu1-team1-cicd-staging-fed-credential)
  13. Click Add.
  14. Back on the service connection screen, check the Grant access permission to all pipelines box. This is okay because it means all DevOps pipelines in our bu1-team DevOps project. If you are organizing things differently, you can always grant access to this connection later on.

Don’t click Verify and save yet!  You need to give your App Registration permission to read your Subscription. If you clicked it anyway, look carefully at the error message. You’ll see that your Part 1 work was saved as a draft. You can come back and complete Part 2 after you do the required steps.

jim_thorstad_12-1776191605220.png

Go to the Azure Portal and find your Subscription. Go to Access control (IAM) and click Add role assignment. Click Reader under Name and then click Next. Leave the Assign access to option set to User, group, or service principal and click the Select members link. Find your service principal (note: your App Registration previously created a service principal in the background with the same name, e.g., bu1-team1-cicd-staging-app), pick it, and click Select.

jim_thorstad_13-1776191638184.png

Click Review + Assign.

Now return to Azure DevOps and your Edit service connection Part 2 screen and click the Verify and save button.

Congratulations!  That’s a lot of steps, but it's necessary because you need to establish a trust relationship between your Azure DevOps project and your Azure Portal App Registration/Service Principal; these services do not all trust each other by default. 

Now we are going to configure trust between our Azure App Registration/Service Principal and the Databricks Service Principal that will run the resources Azure DevOps deployed in your workspace.

Create the Databricks Service Principal for CI/CD Tasks

Go to your staging workspace (in this article, we use the bu1-dev workspace). Go to Workspace Settings > Identity and access > Service principals and click Manage. Click Add service principal and then click the Add new button.

Under Management, choose Microsoft Entra ID managed, and then copy the Application ID from your Azure App Registration over. 

Note: as of April 2026, the UI shown below for the Microsoft Entra ID managed option shows a field for entering the service principal name. While this field allows you to enter and save a value, it will not be retained, as you will see when you go to search for this service principal in the permissions UI throughout Databricks.

Give it any name like bu1-team1-cicd-staging-sp (but you’ll see it will be changed back to the name of your App Registration shortly) and click Add.

jim_thorstad_14-1776191753695.png

Note: It’s possible to use a single Azure DevOps Service Connection, Azure App Registration, and Databricks Service Principal for all of your CI/CD processes (including both staging and production deployments). However, you’ll notice we’ve been including the term “staging” in the names of these CI/CD resources we’ve been creating, so you can later add the “prod” versions and have more precise governance over your CI/CD process.

Click your Databricks SP after it's created. It will need the Workspace Access and SQL access entitlements, depending on your code. I recommend using Serverless jobs, especially during development, to speed up your trial-and-error CI/CD testing. If you use Serverless jobs, you do not need to assign the Allow unrestricted cluster creation entitlement.

jim_thorstad_15-1776191786351.png

Explore the Permissions tab. This lets you decide who can manage and use the SP, not what this SP can do in your workspace. You can consider whether a team in the staging environment needs to manually re-run a job for some reason. In this case, give that team the USE permission.

Explore the Git integration tab. This feature allows your SP to connect with a Git repo, which is needed when the job is configured to pull notebook code from Git at runtime. But we are not using that approach in this blog. We are configuring DevOps to push the notebook code and bundle resources to each environment. Leave Git integration unconfigured for the SP.

Create the Staging Deployment Folder

In the workspace, create a staging location where the Azure DevOps pipeline will deploy your code. Then right-click this folder and choose the Share (Permissions) menu. Search for your SP and grant it Can Manage permissions, as shown below (remember, as mentioned previously, you are searching for the App Registration you created in the Azure portal, which is synced to the name of the Databricks service principal you just created). 

If you would like your development team to see what is deployed here, grant them View permissions. Generally, developers, QA teams, and admins should not have write access to the staging bundle target deployment folder.

jim_thorstad_16-1776191820288.png

Assigning Unity Catalog Permissions to your Service Principal

There are a few strategies for creating and managing permissions on catalogs and schemas. In this blog, we create the catalogs manually and assign Data Editor permissions to our SP. We also grant the Manage permission so our SP can drop tables and manage permissions if we were to do that in our job code. Feel free to reconsider and evolve your choices as necessary.

Give your SP permission to create data assets in your stage catalog as shown below.

jim_thorstad_17-1776191855258.png

Go into the default schema in your staging catalog and notice that many of these permissions are inherited as expected. Look at the permissions on your bu1_stg.default schema. Notice that your SP can read and write to this schema even though it is not the owner (you are the owner since it was created when you made the catalog).

Create your Azure DevOps Pipeline (bu1-team1-hello-world-pipeline.yml)

We want our CI/CD pipeline to have two stages:

  1. DeployStaging - Runs automatically when code from a feature branch has been approved and merged into the main branch.
  2. DeployProd - Runs after DeployStaging and after an approval that you can configure.

We’ll use DevOps variable groups to store the workspace URLs and a DevOps environment to define the approval process that gates our prod deployment.

Go to your DevOps project > Pipelines > Library and click the + Variable group button. Name it stagingVariables (you’ll reference this in the first stage of your pipeline) and add a variable for DATABRICKS_HOST along with a URL pointing to the hostname of your staging workspace (bu1-dev in the example for this article). Then click Save

Repeat the process for another variable group called prodVariables, which you will reference in the second stage of your pipeline.

jim_thorstad_18-1776191886815.png

Go to your DevOps project > Environments and click the Create environment button. Name it prod (you’ll reference this in the second stage of your pipeline), leave Resource set to None, and click Create

Next, click the Approvals and checks tab and click the Approvals check. For this article, we’ll just pick two approvers, but you can configure this as required by your organization.

jim_thorstad_19-1776191913682.png

You do not need to create a staging environment, as we plan to deploy automatically after merging to main with no conditions or approvals.

Go to your DevOps project > Repos > Files page and create a new folder (e.g., .cicd) and your Hello World pipeline inside it. 

jim_thorstad_20-1776191946563.png

Copy the following YAML into it and edit the azureSubscription value on lines 33 and 102 to match the name of your DevOps Service Connections (yes; its confusing that it needs the service connection name, but it asks for the subscription name). Review the comments and make any other necessary adjustments.

# This triggers the pipeline on a commit to the 'main' branch
trigger:
  branches:
    include:
    - main

# Prevents the pipeline from triggering on Pull Requests; only on merges
pr: none

stages:
# -----------------------------------------------
# Stage 1: Deploy to Staging (automatic on merge)
# -----------------------------------------------
# TODO - In the bundle validate/deploy steps below, change "workingDirectory"
# to the folder containing your databricks.yml and edit the target values if
# your choice differs from "staging" used in the example below.
- stage: DeployStaging
  displayName: 'Deploy to Staging'
  variables:
    - group: stagingVariables
  condition: |
    eq(variables['Build.SourceBranch'], 'refs/heads/main')
  jobs:
  - job: deployStagingJob
    pool:
      vmImage: ubuntu-latest

    steps:
    - task: UsePythonVersion@0  
      inputs:
        versionSpec: 3.11  

    - task: AzureCLI@2
      displayName: 'Get Databricks Token (Staging)'
      inputs:
        # TODO - Change azureSubscription value to the name of your DevOps "staging" Service Connection
        azureSubscription: bu1-team1-staging-svc-conn
        scriptType: 'bash'
        scriptLocation: 'inlineScript'
        failOnStandardError: true
        # Set the token as a pipeline variable for subsequent steps; note that the "echo"
        # line does not print the token to the pipeline logs. The resource ID is the well-known
        # Azure AD application ID for Databricks (same for all tenants).
        inlineScript: |
          set -e
          DATABRICKS_TOKEN=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d --query "accessToken" -o tsv)                   
          echo "##vso[task.setvariable variable=DATABRICKS_TOKEN]$DATABRICKS_TOKEN"

    # Clone the repository so the bundle files are available to the pipeline agent
    - checkout: self
      displayName: 'Checkout repository'

    # Install Python dependencies here if your bundle requires them
    # For example:
    # - script: pip install -r requirements.txt
    #   displayName: 'Install Python dependencies' 

    - script: |
        curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
        databricks --version
      displayName: 'Install Databricks CLI'

    - script: databricks bundle validate -t staging
      displayName: 'Validate bundle for the "staging" environment'
      env:
        DATABRICKS_HOST: $(DATABRICKS_HOST)
        DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
      workingDirectory: $(Build.SourcesDirectory)/hello_world

    - script: databricks bundle deploy -t staging
      displayName: 'Deploy bundle to the "staging" environment'
      env:
        DATABRICKS_HOST: $(DATABRICKS_HOST)
        DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
      workingDirectory: $(Build.SourcesDirectory)/hello_world

# -----------------------------------------------
# Stage 2: Deploy to Production (requires approval in Azure DevOps)
# -----------------------------------------------
# TODO - In the bundle validate/deploy steps below, change "workingDirectory"
# to the folder containing your databricks.yml and edit the target values if
# your choice differs from "prod" used in the example below.
- stage: DeployProd
  displayName: 'Deploy to Prod'
  dependsOn: DeployStaging
  condition: succeeded()
  variables:
    - group: prodVariables
  jobs:
  # A "deployment:" job (rather than a regular "job:") is required to
  # connect to an Azure DevOps Environment and enable the approval gate.
  - deployment: deployProdJob
    displayName: 'Deploy to Prod'
    pool:
      vmImage: ubuntu-latest
    # Setup the approval gate. The value for "environment" must match 
    # the Azure DevOps Environment name where approvals are configured.
    environment: 'prod'
    strategy:
      runOnce:
        deploy:
          steps:
          - checkout: self

          - task: UsePythonVersion@0
            displayName: 'Use Python 3.11'
            inputs:
              versionSpec: 3.11

          - task: AzureCLI@2
            displayName: 'Get Databricks Token (Prod)'
            inputs:
              # TODO - Change azureSubscription value to the name of your DevOps "prod" Service Connection
              azureSubscription: bu1-team1-prod-svc-conn
              scriptType: 'bash'
              scriptLocation: 'inlineScript'
              failOnStandardError: true
              # Set the token as a pipeline variable for subsequent steps; note that the "echo"
              # line does not print the token to the pipeline logs. The resource ID is the well-known
              # Azure AD application ID for Databricks (same for all tenants).
              inlineScript: |
                set -e
                DATABRICKS_TOKEN=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d --query "accessToken" -o tsv)                   
                echo "##vso[task.setvariable variable=DATABRICKS_TOKEN]$DATABRICKS_TOKEN"

          # Install Python dependencies here if your bundle requires them
          # For example:
          # - script: pip install -r requirements.txt
          #   displayName: 'Install Python dependencies'

          - script: |
              curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
              databricks --version
            displayName: 'Install Databricks CLI'

          - script: databricks bundle validate -t prod
            displayName: 'Validate bundle for the "prod" environment'
            env:
              DATABRICKS_HOST: $(DATABRICKS_HOST)
              DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
            workingDirectory: $(Build.SourcesDirectory)/hello_world

          - script: databricks bundle deploy -t prod
            displayName: 'Deploy bundle to the "prod" environment'
            env:
              DATABRICKS_HOST: $(DATABRICKS_HOST)
              DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
            workingDirectory: $(Build.SourcesDirectory)/hello_world

Click Commit to save it to the main branch of your repo.

jim_thorstad_21-1776192316430.png

Before making the DevOps pipeline and pointing it to this file, let’s approve and merge the initial Hello World DABs pull request so there is some code to deploy. 

In your Azure DevOps project, click Repos > Pull requests and select the pull request you created earlier from the Databricks Git integration UI.

jim_thorstad_22-1776192352175.png

Click Complete and then choose your merge options before clicking the Complete merge button.

jim_thorstad_23-1776192375534.png

You can verify that your bundle code is now in the main branch.

Now let’s create the pipeline. Go to your project > Pipelines > Pipelines menu and click Create Pipeline. Choose Azure Repos Git.

jim_thorstad_24-1776192394894.png

Choose your team repository and then click Existing Azure Pipelines YAML file.

Enter the path to your pipeline file and click Continue.

jim_thorstad_25-1776192423890.png

Before you can run the pipeline, you need to either:

  1. Repeat the previous steps to create your prod service connection (bu1-team1-prod-svc-conn), your Databricks SP and deployment folder on the prod workspace, and the permissions for the prod catalog, or
  2. Comment out all Stage 2 lines in your pipeline to avoid validation errors.

Deploying to Staging

Once you have decided, click Run.

jim_thorstad_26-1776192461650.png

You do not need to set or change anything on the next screen that comes up.

jim_thorstad_27-1776192492961.png

Your Deploy to Staging job will run. The first time it runs, you will see a message asking for permission to access the staging variable group. Give it permission.

jim_thorstad_28-1776192523578.png

jim_thorstad_29-1776192550923.png

When the staging deployment finishes, go find your Job and run it. 

jim_thorstad_30-1776192577136.png

See that the data was added by the service principal. Depending on the part of the UI, you will either see the App Registration’s App ID or Name as shown below.

jim_thorstad_31-1776192598096.png

While the code deployment, job execution, and data writing to the table were all performed by the Azure App Registration’s SP, it was the Workspace Admin who enabled it in Workspace Settings, and the Unity Catalog owner who granted it editor rights on the schema.

jim_thorstad_32-1776192625054.png

Deploying to Production

When you are ready, click the Review button and approve running the prod deployment stage.

jim_thorstad_33-1776192653029.png

Your bundle should be deployed to the prod workspace. Find the job and run it, then check the prod catalog to see that it ran.

Congratulations!  🎉.    If you made it this far, you are now an Azure Databricks CI/CD champion. 😁

Please share your thoughts about this article in the comments section. What concepts did you learn from this blog that were helpful? Is there anything you don’t understand or are unsure of? What aspects of the example shown do you think should be expanded on for an intro CI/CD best practice article? What other topics would you like to be expanded on in another Technical Blog?

Comparing Two Approaches to Workload Identity Federation: Azure AD-Brokered Service Connections vs. Databricks OIDC Integration

The approach taken for integrating Azure DevOps and Databricks in this blog uses App registration-backed Azure DevOps service connections. This is a well-established pattern for securely federating your pipeline workload between the two cloud services. But Databricks recently introduced an alternative OIDC-based approach. How do these compare, and which is the best option?

Both approaches use secure, token-based protocols (OIDC and OAuth). The difference is the issuer. In the blog, I use the DevOps service connection feature, which means the DevOps pipeline (via the AzureCLI@2 task) obtains an Azure AD access token and passes it to the Databricks CLI as DATABRICKS_TOKEN. This works well when your pipeline also needs access to other Azure resources (Storage, Key Vault, etc.) through the same service principal. 

Databricks also supports a newer alternative: OAuth token federation (also called workload identity federation or OIDC). With this approach, the Azure DevOps pipeline authenticates directly to Databricks — no Azure AD intermediary, no service connections, and no explicit token retrieval step. You create a federation policy on the Databricks service principal that trusts your Azure DevOps organization's OIDC issuer, and the Databricks CLI handles the token exchange automatically using the pipeline's built-in System.AccessToken.

This does result in a simpler pipeline (no AzureCLI@2 task, no Azure service connections), but it only authenticates to Databricks — if you need access to other Azure resources, you'll still need the other service connection approach shown in this blog. To learn more, see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/auth/provider-azure-devops in the Databricks documentation.

jim_thorstad_1-1776277670883.png

Aspect

Approach 1:
Azure AD-Brokered Authentication (This Blog)

Approach 2:
Databricks-Native OIDC Federation (Alternate Option)

Doc link

Connect to Azure using App registration with workload identity federation

Enable workload identity federation for Azure DevOps Pipelines

Maturity

Well-established pattern

Newer (Databricks OAuth federation)

Protocols used

OIDC, OAuth 2.0, bearer token

OIDC → Databricks OAuth 2.0

Auth intermediary

Azure AD (token broker)

None (direct OIDC)

Token acquisition

Explicit  az account get-access-token

Automatic by Databricks SDK/CLI

Azure CLI in pipeline

Required

Not needed

Azure service connections

Required (one per env)

Not needed

Federation policy

On Azure AD (trusts Azure DevOps)

On Databricks SP (trusts Azure DevOps)

Complexity

Two hops; 8 pipeline steps per stage

One hop; 2-3 pipeline steps per stage

Azure resource access

The same SP can access other Azure resources (key vault, etc.).

Only authenticates to Databricks

Better when

  • The pipeline also needs to interact with other Azure resources (storage, Key Vault, etc.) using the same service principal
  • Your org already has Azure AD service connections established and governed
  • You want the well-trodden, battle-tested path that most Azure shops already understand
  • The pipeline only needs Databricks access
  • You want the simplest possible pipeline YAML
  • You want to eliminate Azure AD as a dependency/intermediary
  • You're on a non-Azure cloud but still using Azure DevOps (the OIDC issuer is Azure DevOps itself, not Azure AD)
  • You want the Databricks SDK to handle the token lifecycle automatically

What’s different

--

Skips Azure AD entirely. A federation policy is created directly on the Databricks service principal that trusts the Azure DevOps OIDC issuer (https://vstoken.dev.azure.com/<org_id>). The pipeline YAML is

   Minimal:

jim_thorstad_0-1776277460580.png

Requires

--
  • A Databricks service principal with a federation policy configured via
    databricks account service-principal-federation-policy create
  •   - Three environment variables + $(System.AccessToken)
  •   - No Azure AD, no Azure service connections, no AzureCLI@2 task