Databricks Community

alcatraz96 · ‎12-10-2024

Hi everyone,

I am working on setting up a complete end-to-end CI/CD process for my Databricks environment using Azure DevOps. So far, I have developed a build pipeline to create a Databricks artifact (DAB).

Now, I need to create a release pipeline to deploy this artifact into production. My plan is to use the artifact from the build pipeline and the Databricks REST API to push it into production.

Questions:

Will this approach publish workflows and notebooks into production exactly as they are in the development environment?
Are there any best practices or recommendations for structuring the release pipeline?

I am new to this and would appreciate any suggestions.

Below is the code I’m currently using in the release pipeline.

Release Pipeline Code:

# Define Databricks variables
$databricksUrl = "<Databricks-URL>" # Replace with your Databricks instance URL
$accessToken = "<Access-Token>" # Replace with your secure token

# Define headers for Databricks REST API
$headers = @{
    "Authorization" = "Bearer $accessToken"
}

# Paths inside the Databricks workspace
$workspaceBasePath = ""
$notebookPath = ""
$jobPath = ""

# Function to create directories in Databricks
function Create-Directory {
    param ([string]$directoryPath)
    $createDirUri = "$databricksUrl/api/2.0/workspace/mkdirs"
    $body = @{ "path" = $directoryPath }
    
    try {
        Invoke-RestMethod -Method POST -Uri $createDirUri -Headers $headers -Body ($body | ConvertTo-Json -Depth 10) -ContentType "application/json"
        Write-Output "Directory '$directoryPath' created successfully in Databricks."
    } catch {
        if ($_.Exception.Response.StatusCode -ne 400) {
            Write-Error "Failed to create directory '$directoryPath': $_"
        }
    }
}

# Additional functions (Delete-File, Import-Notebook, Import-Job) are implemented similarly to handle file deletions and imports.

# Example pipeline steps:
Create-Directory -directoryPath "$workspaceBasePath/notebooks"
Create-Directory -directoryPath "$workspaceBasePath/jobs"

Delete-File -filePath "$workspaceBasePath/notebooks/Contingent_Employee_Report"
Delete-File -filePath "$workspaceBasePath/jobs/job-config.json"

Import-Notebook -notebookPath $notebookPath -workspacePath "$workspaceBasePath/notebooks/Contingent_Employee_Report"
Import-Job -jobConfigJsonPath $jobPath

Thank you in advance for your time and suggestions!

szymon_dybczak · ‎12-10-2024

Hi @alcatraz96 ,

One question, why don't you use Databricks Assets Bundles? Then the whole process would be much simpler🙂
Here you have a good end to end example:

CI/CD Integration with Databricks Workflows - Databricks Community - 81821

alcatraz96 · ‎12-11-2024

Thank you for the suggestion. Is there a way to achieve this without using an Azure VM? I'm just curious.

szymon_dybczak · ‎12-11-2024

Hi @alcatraz96 ,

Yes, of course. If you're not using VNet injected workspace deployment with SCC enabled then your workspace should be accessible from public internet. So if that's the case you can use azure hosted agents Self hosted machine is needed when you have closed public access to a workspace.