cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Run workflow using git integration with service principal

Carsten03
New Contributor III

Hi,

I want to run a dbt workflow task and would like to use the git integration for that. Using my personal user I am able to do so but I am running my workflows using a service principal.

I added git credentials and the repository using terraform. I am able to use the repositories in my workspace but I when I trigger the dbt job I get this message:

 

[RepositoryCheckoutFailed] Failed to checkout Git repository: 
PERMISSION_DENIED: Missing Git provider credentials.
Go to User Settings > Git Integration to set up your Git credentials

I want to run the workflow with a service principal as an owner and therefore don't want to set the git credentials for my personal user. How can the service principal get access to the git credential / repository? I have checked via the cli that the credential is available on my workspace. I couldn't find any documentation on how to grant permissions to a credential.

Note that my deployment principal is not the same as the principal that runs my pipelines.

I hope somebody is able to help me with this.

1 ACCEPTED SOLUTION

Accepted Solutions

justOnePost
New Contributor III

Hi @Carsten03 , @MiroFuoli ,
I had a very similar issue and finally got it to work. Though this pretty much depends on your setup. I am using azure databricks and bitbucket, if you have a similar setup the following hopefully works for you.

  1.  Create an azure service princpal: Use the azure portal or terraform to setup an azure service principal. Make sure you store its secret somewhere safe like an azure key vault.
  2. Add the service principal to the databricks workspace: The previously created service principal should be added to the respective Databricks workspace. You can do that by means of the databricks api, terraform or simply in the UI.
    For the latter, you follow settings > Identity and access > Service principals > Add a service principal in the last step you should enter the application id of your service principal and then add it.
  3. Create a repository access token: In your bitbucket repo you go to repository settings > Access tokens > Create Repository Access Token which will show you the access token once.
  4. Create an azure service princpal access token: Similarly, you have to create an access token for your service principal using the azure api. Here is an example using curl

 

curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token \
-d 'client_id=<client-id>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client-secret>'

 

You can find the <tenant-id>, <client-id> in the azure portal by checking the overview of your service principal. The <client-secret> is what you should have stored when in step 1.

  • Add git credentials for your service principal: Using the databricks api we can now add git credentials for this service principal.

 

curl -X POST "https://<databricks-host>.azuredatabricks.net/api/2.0/git-credentials" \
--header 'Authorization: Bearer <azure-service-principal-access-token' \
 --data '{"personal_access_token" : "<repository-access-token>", "git_username" : "x-token-auth", "git_provider" : "bitbucketCloud" }' | jq .

 

Note that the value of "git_username" should be "x-token-auth". Apart from that, you simply use the credentials created in the previous two steps.

I hope that helps you.
Wish you the best!

View solution in original post

8 REPLIES 8

Simranarora
New Contributor III
New Contributor III

To set up Git credentials for a Service Principal to access notebooks in Repos (GitHub) without any dependencies on a personal GitHub account, you can follow these steps:

  1. Create a Service Principal in Azure Active Directory (Azure AD) if you haven't already. This will be used to authenticate with Azure services.

  2. Assign the necessary permissions to the Service Principal. You will need to grant it appropriate permissions to access the GitHub repository where the notebooks are stored. This can be done by adding the Service Principal to the repository with the required access level (e.g., read, write, or admin).

  3. Generate a Personal Access Token (PAT) in the GitHub repository. This token will serve as the credentials for the Service Principal to authenticate with GitHub. Go to your GitHub repository's settings, navigate to the "Developer settings" or "Personal access tokens" section, and generate a new token. Make sure to grant it the necessary scopes and permissions to access the repository and perform the required actions.

  4. Store the generated PAT securely. Treat the PAT like a password and ensure it is stored securely. It's recommended to use a secure key vault or secret management system provided by your cloud provider to store the PAT securely.

  5. Configure Git to use the Service Principal and the PAT. On the machine or environment where the jobs will run, set up Git to use the Service Principal's credentials. Run the following commands in a terminal or command prompt:

git config --global credential.username <Service Principal Client ID>
git config --global credential.helper "!f() { echo username=$GIT_USERNAME; echo password=$GIT_PASSWORD; }; f"

7. Replace <Service Principal Client ID> with the actual Client ID of your Service Principal. GIT_USERNAME should be set to the Service Principal's Client ID, and GIT_PASSWORD should be set to the PAT generated in step 3

8. Test the Git configuration. To verify that the Git credentials are set up correctly, you can try cloning or pulling the repository using Git commands. For example:

git clone <repository_url>

 If the credentials are correctly configured, the repository should be cloned without asking for any additional authentication.

By following these steps, you can set up Git credentials for a Service Principal to access notebooks in Repos (GitHub) without relying on a personal GitHub account.

 
Best Regards,
Simran

Carsten03
New Contributor III

Hi @Simranarora ,

thank you for your answer. I am not sure if I have expressed myself poorly but this is not fixing the issue I actually have. I have already made the connection via git credentials using a technical user on the git provider side (I use Bigtbucket btw.). I am also able to connect and add repositories within Databricks when adding the git configuration under "user settings > linked account". When running my workflow on Databricks (dbt) with my personal user (under "run as") everything works and dbt can make it's updates based on the config in the repository.

What I am seeking is on how to detach this from my personal user on Databricks. I want to run the workflow task as a service principal. Yet I am not able to set the git credentials for the service principal and it also cannot use the configured credentials that I added via the UI.

Therefore I have tried to deploy a "git credential" via terraform. This also allows me to checkout my repositories under "Workspace". But within the workflow it fails with above mentioned error message.

I hope this makes it more understandable!

MiroFuoli
New Contributor II

Hi @Carsten03 ,

I'm facing the same issue at the moment. Have you been able to solve this issue?

Cheers,

Carsten03
New Contributor III

Hey @MiroFuoli, unfortunately not 😞. I am using my personal user, and hope that at some point somebody has a solution. If you find anything, let me know! 

justOnePost
New Contributor III

Hi @Carsten03 , @MiroFuoli ,
I had a very similar issue and finally got it to work. Though this pretty much depends on your setup. I am using azure databricks and bitbucket, if you have a similar setup the following hopefully works for you.

  1.  Create an azure service princpal: Use the azure portal or terraform to setup an azure service principal. Make sure you store its secret somewhere safe like an azure key vault.
  2. Add the service principal to the databricks workspace: The previously created service principal should be added to the respective Databricks workspace. You can do that by means of the databricks api, terraform or simply in the UI.
    For the latter, you follow settings > Identity and access > Service principals > Add a service principal in the last step you should enter the application id of your service principal and then add it.
  3. Create a repository access token: In your bitbucket repo you go to repository settings > Access tokens > Create Repository Access Token which will show you the access token once.
  4. Create an azure service princpal access token: Similarly, you have to create an access token for your service principal using the azure api. Here is an example using curl

 

curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token \
-d 'client_id=<client-id>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client-secret>'

 

You can find the <tenant-id>, <client-id> in the azure portal by checking the overview of your service principal. The <client-secret> is what you should have stored when in step 1.

  • Add git credentials for your service principal: Using the databricks api we can now add git credentials for this service principal.

 

curl -X POST "https://<databricks-host>.azuredatabricks.net/api/2.0/git-credentials" \
--header 'Authorization: Bearer <azure-service-principal-access-token' \
 --data '{"personal_access_token" : "<repository-access-token>", "git_username" : "x-token-auth", "git_provider" : "bitbucketCloud" }' | jq .

 

Note that the value of "git_username" should be "x-token-auth". Apart from that, you simply use the credentials created in the previous two steps.

I hope that helps you.
Wish you the best!

-werners-
Esteemed Contributor III

nice, I will have to check that out!  I gave up on this already.

camilo_s
New Contributor III

We also have implemented it like this, and while it works, in my opinion it doesn't cleanly solve the problem of detaching a service principal from a user:

  • If you use a "technical user" on the Git provider side, which is just a plain user created for the workload, you may have to deactivate any MFA your org may have and just rely on basic authentication. Seeing as the service principal will have data and repo access permissions protected by basic authentication should ring an alarm at least (this is a very real risk, as demonstrated in some recent news involving a major warehouse provider's customers relying on plain basic auth)
  • You can instead use a PAT tied to a Git provider user, but this defeats the purpose of using the service principal for automation since you'll have to manage the user's PAT's lifecycle anyhow.

The OAuth client credentials flow exist for use cases like this and I hope it eventually it makes way to Databricks.

FWIW I have a discussion on this topic: https://community.databricks.com/t5/forums/v5/forumtopicpage.inlinemessagereplyeditor.form.form.form...

camilo_s
New Contributor III

I created that link using the "Share" button in the post but it's broken, sorry 😂

Here's a working link to the discussion: https://community.databricks.com/t5/data-engineering/git-credentials-for-service-principals-running-...

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!