cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Guidance Needed for Developing CI/CD Process in Databricks Using Azure DevOps

alcatraz96
New Contributor

Hi everyone,

I am working on setting up a complete end-to-end CI/CD process for my Databricks environment using Azure DevOps. So far, I have developed a build pipeline to create a Databricks artifact (DAB).

alcatraz96_1-1733897791930.png

 

Now, I need to create a release pipeline to deploy this artifact into production. My plan is to use the artifact from the build pipeline and the Databricks REST API to push it into production.

Questions:

  1. Will this approach publish workflows and notebooks into production exactly as they are in the development environment?
  2. Are there any best practices or recommendations for structuring the release pipeline?

I am new to this and would appreciate any suggestions.

Below is the code I’m currently using in the release pipeline.


Release Pipeline Code:

# Define Databricks variables
$databricksUrl = "<Databricks-URL>" # Replace with your Databricks instance URL
$accessToken = "<Access-Token>" # Replace with your secure token

# Define headers for Databricks REST API
$headers = @{
    "Authorization" = "Bearer $accessToken"
}

# Paths inside the Databricks workspace
$workspaceBasePath = ""
$notebookPath = ""
$jobPath = ""

# Function to create directories in Databricks
function Create-Directory {
    param ([string]$directoryPath)
    $createDirUri = "$databricksUrl/api/2.0/workspace/mkdirs"
    $body = @{ "path" = $directoryPath }
    
    try {
        Invoke-RestMethod -Method POST -Uri $createDirUri -Headers $headers -Body ($body | ConvertTo-Json -Depth 10) -ContentType "application/json"
        Write-Output "Directory '$directoryPath' created successfully in Databricks."
    } catch {
        if ($_.Exception.Response.StatusCode -ne 400) {
            Write-Error "Failed to create directory '$directoryPath': $_"
        }
    }
}

# Additional functions (Delete-File, Import-Notebook, Import-Job) are implemented similarly to handle file deletions and imports.

# Example pipeline steps:
Create-Directory -directoryPath "$workspaceBasePath/notebooks"
Create-Directory -directoryPath "$workspaceBasePath/jobs"

Delete-File -filePath "$workspaceBasePath/notebooks/Contingent_Employee_Report"
Delete-File -filePath "$workspaceBasePath/jobs/job-config.json"

Import-Notebook -notebookPath $notebookPath -workspacePath "$workspaceBasePath/notebooks/Contingent_Employee_Report"
Import-Job -jobConfigJsonPath $jobPath

Thank you in advance for your time and suggestions!

3 REPLIES 3

szymon_dybczak
Contributor III

Hi @alcatraz96 ,

One question, why don't you use Databricks Assets Bundles? Then the whole process would be much simpler🙂
Here you have a good end to end example:

CI/CD Integration with Databricks Workflows - Databricks Community - 81821

Thank you for the suggestion. Is there a way to achieve this without using an Azure VM? I'm just curious.

Hi @alcatraz96 ,

Yes, of course. If you're not using VNet injected workspace deployment with SCC enabled then your workspace should be accessible from public internet. So if that's the case you can use azure hosted agents Self hosted machine is needed when you have closed public access to a workspace.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group