cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Call Databricks notebook in a specific branch from Azure Data Factory?

baatchus
New Contributor III

I'm using the new Databricks Repos functionality and in Azure Data Factory UI for the notebook activity you can browse the Databricks workspace and select Repos > username > project > folder > notebook.

Is it possible to call a Databricks notebook in a specific branch from Data Factory?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

Exactly.

We have a folder called 'Production' which is set to main, always.

The only time we do something in this folder is to pull the (approved) changes from git.

Data Factory uses this path for executing notebooks.

You can do a rest call to make sure the branch is set to main, but we do not do that as we are disciplined enoug to not mess with it 🙂

And this you can do also for QA (if this resides in the same databricks workspace)

Further, each developer has his own folder. In this folder he/she can create branches and do commits. You can only have one branch with open changes, so if you want to switch branch you first have to commit.

Why not a single 'Development' folder instead of one per developer? Simple: each folder can only have one branch active. If you work with multiple people into one folder, big mess!

If you have multiple workspace, the same principles are valid. A folder per developer and one 'golden' branch which gets promoted to other workspaces.

View solution in original post

6 REPLIES 6

Kaniz
Community Manager
Community Manager

Hi @ baatchus! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up with my team and get back to you soon.Thanks.

Ryan_Chynoweth
Honored Contributor III

I believe that the branch is defaulted to the branch that is set via the UI or by the repos REST API.

In your workflow I would think that you will need to call a REST api to change the branch of the repo, Then you would execute the notebook. For example, this may be a situation where you create a release branch and want to dynamically change the code that is executing. You may need to make a pull request as well but not certain.

Let me know if this helps. I can provide more detail.

-werners-
Esteemed Contributor III

Data Factory will take the branch which was set by the Databricks Web UI.

So you can go using the REST API, but what we do is just use different folders.

You can create folders in the REPOS and under each folder you can select another branch.

(f.e. your own user folder for dev, and another one for production/QA etc)

baatchus
New Contributor III

Ok so you create top level folders called Production and then sync the repo and have it at the main/master branch.

Do you call the REST API from Data Factory to update the branch before running the notebook activity?

Do you parameterize the notebook path in Data Factory for the top level folder in Databricks Repos?

How does a typical workflow look like? Have you configured Databricks Repos with top level folders on all environments? Dev, Stage, Production?

So many questions but so little documentation and guidance from Databricks/Microsoft on this part..

-werners-
Esteemed Contributor III

Exactly.

We have a folder called 'Production' which is set to main, always.

The only time we do something in this folder is to pull the (approved) changes from git.

Data Factory uses this path for executing notebooks.

You can do a rest call to make sure the branch is set to main, but we do not do that as we are disciplined enoug to not mess with it 🙂

And this you can do also for QA (if this resides in the same databricks workspace)

Further, each developer has his own folder. In this folder he/she can create branches and do commits. You can only have one branch with open changes, so if you want to switch branch you first have to commit.

Why not a single 'Development' folder instead of one per developer? Simple: each folder can only have one branch active. If you work with multiple people into one folder, big mess!

If you have multiple workspace, the same principles are valid. A folder per developer and one 'golden' branch which gets promoted to other workspaces.

Maksym
New Contributor III

Greetings, I have similar problem.

Did you try to use Databricks workflows instead and schedule them instead on Data Factory?

Because inside workflows it is possible to select a specific branch, so it may actually work.

What do you think?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.