cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Call Databricks notebook in a specific branch from Azure Data Factory?

baatchus
New Contributor III

I'm using the new Databricks Repos functionality and in Azure Data Factory UI for the notebook activity you can browse the Databricks workspace and select Repos > username > project > folder > notebook.

Is it possible to call a Databricks notebook in a specific branch from Data Factory?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

Exactly.

We have a folder called 'Production' which is set to main, always.

The only time we do something in this folder is to pull the (approved) changes from git.

Data Factory uses this path for executing notebooks.

You can do a rest call to make sure the branch is set to main, but we do not do that as we are disciplined enoug to not mess with it ๐Ÿ™‚

And this you can do also for QA (if this resides in the same databricks workspace)

Further, each developer has his own folder. In this folder he/she can create branches and do commits. You can only have one branch with open changes, so if you want to switch branch you first have to commit.

Why not a single 'Development' folder instead of one per developer? Simple: each folder can only have one branch active. If you work with multiple people into one folder, big mess!

If you have multiple workspace, the same principles are valid. A folder per developer and one 'golden' branch which gets promoted to other workspaces.

View solution in original post

5 REPLIES 5

Ryan_Chynoweth
Esteemed Contributor

I believe that the branch is defaulted to the branch that is set via the UI or by the repos REST API.

In your workflow I would think that you will need to call a REST api to change the branch of the repo, Then you would execute the notebook. For example, this may be a situation where you create a release branch and want to dynamically change the code that is executing. You may need to make a pull request as well but not certain.

Let me know if this helps. I can provide more detail.

-werners-
Esteemed Contributor III

Data Factory will take the branch which was set by the Databricks Web UI.

So you can go using the REST API, but what we do is just use different folders.

You can create folders in the REPOS and under each folder you can select another branch.

(f.e. your own user folder for dev, and another one for production/QA etc)

baatchus
New Contributor III

Ok so you create top level folders called Production and then sync the repo and have it at the main/master branch.

Do you call the REST API from Data Factory to update the branch before running the notebook activity?

Do you parameterize the notebook path in Data Factory for the top level folder in Databricks Repos?

How does a typical workflow look like? Have you configured Databricks Repos with top level folders on all environments? Dev, Stage, Production?

So many questions but so little documentation and guidance from Databricks/Microsoft on this part..

-werners-
Esteemed Contributor III

Exactly.

We have a folder called 'Production' which is set to main, always.

The only time we do something in this folder is to pull the (approved) changes from git.

Data Factory uses this path for executing notebooks.

You can do a rest call to make sure the branch is set to main, but we do not do that as we are disciplined enoug to not mess with it ๐Ÿ™‚

And this you can do also for QA (if this resides in the same databricks workspace)

Further, each developer has his own folder. In this folder he/she can create branches and do commits. You can only have one branch with open changes, so if you want to switch branch you first have to commit.

Why not a single 'Development' folder instead of one per developer? Simple: each folder can only have one branch active. If you work with multiple people into one folder, big mess!

If you have multiple workspace, the same principles are valid. A folder per developer and one 'golden' branch which gets promoted to other workspaces.

Maksym
New Contributor III

Greetings, I have similar problem.

Did you try to use Databricks workflows instead and schedule them instead on Data Factory?

Because inside workflows it is possible to select a specific branch, so it may actually work.

What do you think?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group