โ02-20-2024 03:06 AM - edited โ02-20-2024 03:12 AM
Hi,
I want to run a dbt workflow task and would like to use the git integration for that. Using my personal user I am able to do so but I am running my workflows using a service principal.
I added git credentials and the repository using terraform. I am able to use the repositories in my workspace but I when I trigger the dbt job I get this message:
[RepositoryCheckoutFailed] Failed to checkout Git repository:
PERMISSION_DENIED: Missing Git provider credentials.
Go to User Settings > Git Integration to set up your Git credentials
I want to run the workflow with a service principal as an owner and therefore don't want to set the git credentials for my personal user. How can the service principal get access to the git credential / repository? I have checked via the cli that the credential is available on my workspace. I couldn't find any documentation on how to grant permissions to a credential.
Note that my deployment principal is not the same as the principal that runs my pipelines.
I hope somebody is able to help me with this.
โ06-11-2024 01:47 AM
Hi @Carsten03 , @MiroFuoli ,
I had a very similar issue and finally got it to work. Though this pretty much depends on your setup. I am using azure databricks and bitbucket, if you have a similar setup the following hopefully works for you.
curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token \
-d 'client_id=<client-id>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client-secret>'
You can find the <tenant-id>, <client-id> in the azure portal by checking the overview of your service principal. The <client-secret> is what you should have stored when in step 1.
curl -X POST "https://<databricks-host>.azuredatabricks.net/api/2.0/git-credentials" \
--header 'Authorization: Bearer <azure-service-principal-access-token' \
--data '{"personal_access_token" : "<repository-access-token>", "git_username" : "x-token-auth", "git_provider" : "bitbucketCloud" }' | jq .
Note that the value of "git_username" should be "x-token-auth". Apart from that, you simply use the credentials created in the previous two steps.
I hope that helps you.
Wish you the best!
โ02-21-2024 12:01 AM
To set up Git credentials for a Service Principal to access notebooks in Repos (GitHub) without any dependencies on a personal GitHub account, you can follow these steps:
Create a Service Principal in Azure Active Directory (Azure AD) if you haven't already. This will be used to authenticate with Azure services.
Assign the necessary permissions to the Service Principal. You will need to grant it appropriate permissions to access the GitHub repository where the notebooks are stored. This can be done by adding the Service Principal to the repository with the required access level (e.g., read, write, or admin).
Generate a Personal Access Token (PAT) in the GitHub repository. This token will serve as the credentials for the Service Principal to authenticate with GitHub. Go to your GitHub repository's settings, navigate to the "Developer settings" or "Personal access tokens" section, and generate a new token. Make sure to grant it the necessary scopes and permissions to access the repository and perform the required actions.
Store the generated PAT securely. Treat the PAT like a password and ensure it is stored securely. It's recommended to use a secure key vault or secret management system provided by your cloud provider to store the PAT securely.
Configure Git to use the Service Principal and the PAT. On the machine or environment where the jobs will run, set up Git to use the Service Principal's credentials. Run the following commands in a terminal or command prompt:
git config --global credential.username <Service Principal Client ID>
git config --global credential.helper "!f() { echo username=$GIT_USERNAME; echo password=$GIT_PASSWORD; }; f"
7. Replace <Service Principal Client ID>
with the actual Client ID of your Service Principal. GIT_USERNAME
should be set to the Service Principal's Client ID, and GIT_PASSWORD
should be set to the PAT generated in step 3
8. Test the Git configuration. To verify that the Git credentials are set up correctly, you can try cloning or pulling the repository using Git commands. For example:
git clone <repository_url>
If the credentials are correctly configured, the repository should be cloned without asking for any additional authentication.
By following these steps, you can set up Git credentials for a Service Principal to access notebooks in Repos (GitHub) without relying on a personal GitHub account.
โ02-21-2024 02:45 AM
Hi @Simranarora ,
thank you for your answer. I am not sure if I have expressed myself poorly but this is not fixing the issue I actually have. I have already made the connection via git credentials using a technical user on the git provider side (I use Bigtbucket btw.). I am also able to connect and add repositories within Databricks when adding the git configuration under "user settings > linked account". When running my workflow on Databricks (dbt) with my personal user (under "run as") everything works and dbt can make it's updates based on the config in the repository.
What I am seeking is on how to detach this from my personal user on Databricks. I want to run the workflow task as a service principal. Yet I am not able to set the git credentials for the service principal and it also cannot use the configured credentials that I added via the UI.
Therefore I have tried to deploy a "git credential" via terraform. This also allows me to checkout my repositories under "Workspace". But within the workflow it fails with above mentioned error message.
I hope this makes it more understandable!
โ03-06-2024 01:09 AM
Hi @Carsten03 ,
I'm facing the same issue at the moment. Have you been able to solve this issue?
Cheers,
โ03-06-2024 05:44 AM
Hey @MiroFuoli, unfortunately not ๐. I am using my personal user, and hope that at some point somebody has a solution. If you find anything, let me know!
โ06-11-2024 01:47 AM
Hi @Carsten03 , @MiroFuoli ,
I had a very similar issue and finally got it to work. Though this pretty much depends on your setup. I am using azure databricks and bitbucket, if you have a similar setup the following hopefully works for you.
curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token \
-d 'client_id=<client-id>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client-secret>'
You can find the <tenant-id>, <client-id> in the azure portal by checking the overview of your service principal. The <client-secret> is what you should have stored when in step 1.
curl -X POST "https://<databricks-host>.azuredatabricks.net/api/2.0/git-credentials" \
--header 'Authorization: Bearer <azure-service-principal-access-token' \
--data '{"personal_access_token" : "<repository-access-token>", "git_username" : "x-token-auth", "git_provider" : "bitbucketCloud" }' | jq .
Note that the value of "git_username" should be "x-token-auth". Apart from that, you simply use the credentials created in the previous two steps.
I hope that helps you.
Wish you the best!
โ06-11-2024 02:23 AM
nice, I will have to check that out! I gave up on this already.
โ06-18-2024 03:02 AM
We also have implemented it like this, and while it works, in my opinion it doesn't cleanly solve the problem of detaching a service principal from a user:
The OAuth client credentials flow exist for use cases like this and I hope it eventually it makes way to Databricks.
FWIW I have a discussion on this topic: https://community.databricks.com/t5/forums/v5/forumtopicpage.inlinemessagereplyeditor.form.form.form...
โ06-18-2024 03:04 AM
I created that link using the "Share" button in the post but it's broken, sorry ๐
Here's a working link to the discussion: https://community.databricks.com/t5/data-engineering/git-credentials-for-service-principals-running-...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group