Guidance Needed for Developing CI/CD Process in Databricks Using Azure DevOps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 10:17 PM
Hi everyone,
I am working on setting up a complete end-to-end CI/CD process for my Databricks environment using Azure DevOps. So far, I have developed a build pipeline to create a Databricks artifact (DAB).
Now, I need to create a release pipeline to deploy this artifact into production. My plan is to use the artifact from the build pipeline and the Databricks REST API to push it into production.
Questions:
- Will this approach publish workflows and notebooks into production exactly as they are in the development environment?
- Are there any best practices or recommendations for structuring the release pipeline?
I am new to this and would appreciate any suggestions.
Below is the code I’m currently using in the release pipeline.
Release Pipeline Code:
# Define Databricks variables $databricksUrl = "<Databricks-URL>" # Replace with your Databricks instance URL $accessToken = "<Access-Token>" # Replace with your secure token # Define headers for Databricks REST API $headers = @{ "Authorization" = "Bearer $accessToken" } # Paths inside the Databricks workspace $workspaceBasePath = "" $notebookPath = "" $jobPath = "" # Function to create directories in Databricks function Create-Directory { param ([string]$directoryPath) $createDirUri = "$databricksUrl/api/2.0/workspace/mkdirs" $body = @{ "path" = $directoryPath } try { Invoke-RestMethod -Method POST -Uri $createDirUri -Headers $headers -Body ($body | ConvertTo-Json -Depth 10) -ContentType "application/json" Write-Output "Directory '$directoryPath' created successfully in Databricks." } catch { if ($_.Exception.Response.StatusCode -ne 400) { Write-Error "Failed to create directory '$directoryPath': $_" } } } # Additional functions (Delete-File, Import-Notebook, Import-Job) are implemented similarly to handle file deletions and imports. # Example pipeline steps: Create-Directory -directoryPath "$workspaceBasePath/notebooks" Create-Directory -directoryPath "$workspaceBasePath/jobs" Delete-File -filePath "$workspaceBasePath/notebooks/Contingent_Employee_Report" Delete-File -filePath "$workspaceBasePath/jobs/job-config.json" Import-Notebook -notebookPath $notebookPath -workspacePath "$workspaceBasePath/notebooks/Contingent_Employee_Report" Import-Job -jobConfigJsonPath $jobPath
Thank you in advance for your time and suggestions!
- Labels:
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2024 11:54 PM - edited 12-10-2024 11:56 PM
Hi @alcatraz96 ,
One question, why don't you use Databricks Assets Bundles? Then the whole process would be much simpler🙂
Here you have a good end to end example:
CI/CD Integration with Databricks Workflows - Databricks Community - 81821
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2024 03:02 AM
Thank you for the suggestion. Is there a way to achieve this without using an Azure VM? I'm just curious.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2024 03:08 AM
Hi @alcatraz96 ,
Yes, of course. If you're not using VNet injected workspace deployment with SCC enabled then your workspace should be accessible from public internet. So if that's the case you can use azure hosted agents Self hosted machine is needed when you have closed public access to a workspace.

