โ09-27-2021 07:04 AM
I'm using the new Databricks Repos functionality and in Azure Data Factory UI for the notebook activity you can browse the Databricks workspace and select Repos > username > project > folder > notebook.
Is it possible to call a Databricks notebook in a specific branch from Data Factory?
โ09-28-2021 05:10 AM
Exactly.
We have a folder called 'Production' which is set to main, always.
The only time we do something in this folder is to pull the (approved) changes from git.
Data Factory uses this path for executing notebooks.
You can do a rest call to make sure the branch is set to main, but we do not do that as we are disciplined enoug to not mess with it ๐
And this you can do also for QA (if this resides in the same databricks workspace)
Further, each developer has his own folder. In this folder he/she can create branches and do commits. You can only have one branch with open changes, so if you want to switch branch you first have to commit.
Why not a single 'Development' folder instead of one per developer? Simple: each folder can only have one branch active. If you work with multiple people into one folder, big mess!
If you have multiple workspace, the same principles are valid. A folder per developer and one 'golden' branch which gets promoted to other workspaces.
โ09-27-2021 07:28 AM
Hi @ baatchus! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up with my team and get back to you soon.Thanks.
โ09-27-2021 11:43 AM
I believe that the branch is defaulted to the branch that is set via the UI or by the repos REST API.
In your workflow I would think that you will need to call a REST api to change the branch of the repo, Then you would execute the notebook. For example, this may be a situation where you create a release branch and want to dynamically change the code that is executing. You may need to make a pull request as well but not certain.
Let me know if this helps. I can provide more detail.
โ09-28-2021 02:08 AM
Data Factory will take the branch which was set by the Databricks Web UI.
So you can go using the REST API, but what we do is just use different folders.
You can create folders in the REPOS and under each folder you can select another branch.
(f.e. your own user folder for dev, and another one for production/QA etc)
โ09-28-2021 04:59 AM
Ok so you create top level folders called Production and then sync the repo and have it at the main/master branch.
Do you call the REST API from Data Factory to update the branch before running the notebook activity?
Do you parameterize the notebook path in Data Factory for the top level folder in Databricks Repos?
How does a typical workflow look like? Have you configured Databricks Repos with top level folders on all environments? Dev, Stage, Production?
So many questions but so little documentation and guidance from Databricks/Microsoft on this part..
โ09-28-2021 05:10 AM
Exactly.
We have a folder called 'Production' which is set to main, always.
The only time we do something in this folder is to pull the (approved) changes from git.
Data Factory uses this path for executing notebooks.
You can do a rest call to make sure the branch is set to main, but we do not do that as we are disciplined enoug to not mess with it ๐
And this you can do also for QA (if this resides in the same databricks workspace)
Further, each developer has his own folder. In this folder he/she can create branches and do commits. You can only have one branch with open changes, so if you want to switch branch you first have to commit.
Why not a single 'Development' folder instead of one per developer? Simple: each folder can only have one branch active. If you work with multiple people into one folder, big mess!
If you have multiple workspace, the same principles are valid. A folder per developer and one 'golden' branch which gets promoted to other workspaces.
โ12-20-2022 06:58 AM
Greetings, I have similar problem.
Did you try to use Databricks workflows instead and schedule them instead on Data Factory?
Because inside workflows it is possible to select a specific branch, so it may actually work.
What do you think?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group