Databricks CI/CD Azure Devops
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-19-2024 12:47 AM - edited 02-19-2024 12:52 AM
Hi all,
I am looking for advice on what would be the best approach when it comes to CI/CD in Databricks and repo in general. What would be the best approach; to have main branch and branch off of it or? How will changes be propagated from dev to qa and then from qa to prod? Jobs will run notebooks from git? Only dev workspace will be connected to git?
Any pointers, advice, help is more than welcomed!
- Labels:
-
Delta Lake
-
Spark
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-03-2024 04:14 AM
We've encountered issues with merge conflicts when pulling updates into the workspace with Repos API in our release pipeline which by the looks of it can only be resolved through the UI and not through Repos API itself. There's some lack of functionality there from the looks of it.
Example we use "databricks repos update" with folder directory and branch preset, but in cases when there are conflicts with the local version we end up getting "Error: Conflict pulling from remote" with no options on how to override this to take the incoming version. So would suggest exploring the possibility of configuring your jobs to run off remote code that's directly in your repo rather than the code in the workspace to avoid this (if using jobs/workflows)
Also on the job definitions check out asset bundles which can be used to deploy workflows across workspaces with different configs per environment if wanted. But keep in mind that workflows do not directly integrate into git anywhere, and there's no isolation from other developers making parallel changes like you have with notebooks under your private directory. So to have workflow definitions version controled you need to essentially make the desired changes to the dev workflow, take the json / yml source code and manually save it somewhere in your repo. That can then be used from your cicd to run the release across environments

