cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks CI/CD Azure Devops

Stellar
New Contributor II

Hi all,

I am looking for advice on what would be the best approach when it comes to CI/CD in Databricks and repo in general. What would be the best approach; to have main branch and branch off of it or? How will changes be propagated from dev to qa and then from qa to prod? Jobs will run notebooks from git? Only dev workspace will be connected to git?

Any pointers, advice, help is more than welcomed!

 

1 REPLY 1

We've encountered issues with merge conflicts when pulling updates into the workspace with Repos API in our release pipeline which by the looks of it can only be resolved through the UI and not through Repos API itself. There's some lack of functionality there from the looks of it.

Example we use "databricks repos update" with folder directory and branch preset, but in cases when there are conflicts with the local version we end up getting "Error: Conflict pulling from remote" with no options on how to override this to take the incoming version. So would suggest exploring the possibility of configuring your jobs to run off remote code that's directly in your repo rather than the code in the workspace to avoid this (if using jobs/workflows)

Also on the job definitions check out asset bundles which can be used to deploy workflows across workspaces with different configs per environment if wanted. But keep in mind that workflows do not directly integrate into git anywhere, and there's no isolation from other developers making parallel changes like you have with notebooks under your private directory. So to have workflow definitions version controled you need to essentially make the desired changes to the dev workflow, take the json / yml source code and manually save it somewhere in your repo. That can then be used from your cicd to run the release across environments

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group