cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Repos configuration for Azure Service Principal

pantelis_mare
Contributor III

Hello community!

I would like to update a repo from within my Azure DevOps release pipeline.

In the pipeline I generate a token using a AAD Service Principal as recommended, and I setup the databricks api using that token.

When I pass the databricks repos update command, I receive an authenitcation error, which is expected and the service principal has not git configured on the workspace side.

My question is:

Can I configure the repos for the SPN programmatically?

Or, is there a way to provide an Azure Devops token when I make the databricks api call? I have tried passing a token by setting the git AZURE_DEVOPS_EXT_PAT but it doesn't seem to work.

Thank you in advance!

31 REPLIES 31

The solution depends on accepting ServicePrincipal tokens credentials as authentication for ADO. There is mention of it on their roadmap (https://docs.microsoft.com/en-us/azure/devops/release-notes/features-timeline) but no timeline is defined yet.

As a workaround we use a user PAT for the time being.

Kirk1
New Contributor III

I see that microsoft has a draft timeline regarding this..... FY23Q1 (TBC): Public Preview to all internal and external customers. I don't know why it takes them so long.

AlexMethod
New Contributor II

Any updates?

Dave_B_
New Contributor III

I was able to accomplish this by getting an AAD service account (not the service principal used by CI/CD) and got rights to use that account to connect to our github repo. I used the databricks git config api to configure the service principal user git config with the AAD service account and PAT that I generated and authorized in github. I then used the Repos CLI to create the repo in databricks under the service principal. I just got this working today. Note that I'm still trying to figure out how to update from github as we are not making any changes within databricks itself.

gentresh
New Contributor III

Dave, do you have a moment to guide me through these steps of yours?

  • What do you mean by, "got rights to use that account to connect to our github repo". What steps would you need to follow, or are you referring to an approval?
  • The reason I ask is that, we recently changed from a user-based to a service-based approach, and while the Azure Service Principal has not changed, I get a PERMISSION_DENIED - cannot read repo.
  • I think that because of our recent change, I will need to assign the Github PAT (new) token to the Service Principal, however I see no way of doing that using the Databricks GUI Workspace.
  • Can you point me to any documentation for: "I used the databricks git config api to configure the service principal user git config with the AAD service account and PAT that I generated and authorized in github"
  • PS: I'm a databricks newb that has inherited a rather complex Azure landing zone.

Help would be more than appreciated!!!

Hi @Gent Reshtani​ 

As SPN is not recognized by Azure DevOps yet, we must use a service account for the repo part, all the other part should be still with SPN.

  1. log in to Azure DevOps with service account, and create a PAT.
  2. create a git credential for the SPN with the service account PAT: Git Credentials API 2.0 | Databricks on AWS.
  3. From now on, each time a job running as SPN needs git operation, it will use automatically the bound git credential created in the step 2, which is the service account's PAT

Be aware that you need to refresh the PAT. and update the git crendential periodically.

gentresh
New Contributor III

I am using Github as my git provider, not Azure Devops. The way that I understand it, I need to update the git credentials for my Service Provider. That's my question, how do I make sure that my SPN is using the correct git credentials (if the PAT token has changed, for example)

although databricks api displays the git cred in a list, in fact you can only bind one git credential per SPN. So if it works after the binding, it's the good one, if not, change it.

gentresh
New Contributor III

@Xiang ZHU​ ,

Okay, I am very close to being able to change the git credential for my Service Principal. My inital error was that, the service principal didn't have the correct permissions to view the Repo. I suspect this has changed because of the change in the PAT token (it used to work before).

How do I leverage the Git Credentials API to make this change? Do i use Postman, or something similar?

Apologies for the low-level question here.

use whatever tool you like to call the API, postman is certainly OK.

Just follow the official API guide: https://docs.databricks.com/dev-tools/api/latest/gitcredentials.html#operation/create-git-credential

only 3 fields in the payload, and the official example is already for github pat.

N.B. call the API with the SPN access token for API authentication. hereunder a snippet to get the access token

curl -X POST \
"https://login.microsoftonline.com/$tenantId/oauth2/v2.0/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "client_id=$client_id" \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d "client_secret=$client_secret" \
| jq -r .access_token

Dave_B_
New Contributor III

Here is a sample in powershell and Azure which checks if it is already configured - if not then it sets the git config for the user - you would just want to tweak it to change the pat if needed:

"Getting current git-credentials..."

$uri = $databricksUrl + "/api/2.0/git-credentials"

$headers = @{

  "Authorization" = "Bearer $databricksToken"

  "X-Databricks-Azure-SP-Management-Token" = $azToken

  "X-Databricks-Azure-Workspace-Resource-Id" = $wsId

}

"Headers: " +$headers

"Checking if git-config already exists."

$gitconfig = Invoke-RestMethod -Uri $uri -Headers $headers

if (![String]::IsNullOrWhitespace($gitconfig)) {

   "Git config already exists"

 } else {

   $body = '{

    "personal_access_token": $gitPat,

    "git_username": $gitUsername,

    "git_provider": "gitHub"

   }'

   $gitconfig = Invoke-RestMethod -Method 'Post' -Uri $uri -Headers $headers -Body $body -ContentType "application/json"

   $gitconfig

 }

gentresh
New Contributor III

>Thank you for this script. It provided me with additional info into Databricks access keys.

>I ran the following CURL using bash (essentially the same)

"

curl -X PATCH -H "Authorization: Bearer $DB_TOKEN" \

-H "X-Databricks-Azure-SP-Management-Token: $AZ_TOKEN" \

-H "X-Databricks-Azure-Workspace-Resource-Id: $WS_ID" \

-d '{"personal_access_token": "$PAT", "git_username": "$GITUSER", "git_provider": "gitHub"}' \

https://$DATABRICKS_URL/api/2.0/git-credentials/801978151980718

"

>It works. I can then use the headers to run different information from the workspace.

>My main issue, is this error One issue that I am getting though, is when I run the following curl:

{"error_code":"PERMISSION_DENIED","message":"PERMISSION_DENIED: Missing required permissions [View] on node with ID '1759335429158542'"}

However, I am unable to locate anything with that ID. I can't view it, I can't delete it.

"

curl -X GET -H "Authorization: Bearer $DB_TOKEN" \

-H "X-Databricks-Azure-SP-Management-Token: $AZ_TOKEN" \

-H "X-Databricks-Azure-Workspace-Resource-Id: $WS_ID" \

https://adb-7866570032917376.16.azuredatabricks.net/api/2.0/repos/1759335429158542

"

This is causing terraform to fail. Do you have any idea what could cause this to fail?

Akhila31
New Contributor II

Hi @Xiang ZHU​ , While generating the credential, for SP - I don't have a git user name for SP in Azure devops since devops does not support adding permissions for SP. Can you pls advice me?

image.png

the git_username is the service account name, this API is to bind service account name's PAT to the SP, this is why you need to use SP's access token in the API auth header

xiangzhu
Contributor III

My use case is to create dbt task inside of a databricks workflow.

It needs to specify `git_source`, and my workflow is run under a service principal account.

Unfortunately, from the beginning of the workflow run. an error is raised like:

"Failed to checkout Git repository: PERMISSION_DENIED: Encountered an error with your Azure Active Directory credentials. Please try logging out of Azure Active Directory (https://portal.azure.com) and logging back in."

But I've no where to grant to the service principal the Azure DevOps checkout permission.

Replacing the service principal with an standard user account works, but we can not use user account in production.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group