02-20-2024 03:06 AM - edited 02-20-2024 03:12 AM
Hi,
I want to run a dbt workflow task and would like to use the git integration for that. Using my personal user I am able to do so but I am running my workflows using a service principal.
I added git credentials and the repository using terraform. I am able to use the repositories in my workspace but I when I trigger the dbt job I get this message:
[RepositoryCheckoutFailed] Failed to checkout Git repository:
PERMISSION_DENIED: Missing Git provider credentials.
Go to User Settings > Git Integration to set up your Git credentials
I want to run the workflow with a service principal as an owner and therefore don't want to set the git credentials for my personal user. How can the service principal get access to the git credential / repository? I have checked via the cli that the credential is available on my workspace. I couldn't find any documentation on how to grant permissions to a credential.
Note that my deployment principal is not the same as the principal that runs my pipelines.
I hope somebody is able to help me with this.
06-11-2024 01:47 AM
Hi @Carsten03 , @MiroFuoli ,
I had a very similar issue and finally got it to work. Though this pretty much depends on your setup. I am using azure databricks and bitbucket, if you have a similar setup the following hopefully works for you.
curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token \
-d 'client_id=<client-id>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client-secret>'
You can find the <tenant-id>, <client-id> in the azure portal by checking the overview of your service principal. The <client-secret> is what you should have stored when in step 1.
curl -X POST "https://<databricks-host>.azuredatabricks.net/api/2.0/git-credentials" \
--header 'Authorization: Bearer <azure-service-principal-access-token' \
--data '{"personal_access_token" : "<repository-access-token>", "git_username" : "x-token-auth", "git_provider" : "bitbucketCloud" }' | jq .
Note that the value of "git_username" should be "x-token-auth". Apart from that, you simply use the credentials created in the previous two steps.
I hope that helps you.
Wish you the best!
02-21-2024 12:01 AM
To set up Git credentials for a Service Principal to access notebooks in Repos (GitHub) without any dependencies on a personal GitHub account, you can follow these steps:
Create a Service Principal in Azure Active Directory (Azure AD) if you haven't already. This will be used to authenticate with Azure services.
Assign the necessary permissions to the Service Principal. You will need to grant it appropriate permissions to access the GitHub repository where the notebooks are stored. This can be done by adding the Service Principal to the repository with the required access level (e.g., read, write, or admin).
Generate a Personal Access Token (PAT) in the GitHub repository. This token will serve as the credentials for the Service Principal to authenticate with GitHub. Go to your GitHub repository's settings, navigate to the "Developer settings" or "Personal access tokens" section, and generate a new token. Make sure to grant it the necessary scopes and permissions to access the repository and perform the required actions.
Store the generated PAT securely. Treat the PAT like a password and ensure it is stored securely. It's recommended to use a secure key vault or secret management system provided by your cloud provider to store the PAT securely.
Configure Git to use the Service Principal and the PAT. On the machine or environment where the jobs will run, set up Git to use the Service Principal's credentials. Run the following commands in a terminal or command prompt:
git config --global credential.username <Service Principal Client ID>
git config --global credential.helper "!f() { echo username=$GIT_USERNAME; echo password=$GIT_PASSWORD; }; f"
7. Replace <Service Principal Client ID> with the actual Client ID of your Service Principal. GIT_USERNAME should be set to the Service Principal's Client ID, and GIT_PASSWORD should be set to the PAT generated in step 3
8. Test the Git configuration. To verify that the Git credentials are set up correctly, you can try cloning or pulling the repository using Git commands. For example:
git clone <repository_url>
If the credentials are correctly configured, the repository should be cloned without asking for any additional authentication.
By following these steps, you can set up Git credentials for a Service Principal to access notebooks in Repos (GitHub) without relying on a personal GitHub account.
02-21-2024 02:45 AM
Hi @Simranarora ,
thank you for your answer. I am not sure if I have expressed myself poorly but this is not fixing the issue I actually have. I have already made the connection via git credentials using a technical user on the git provider side (I use Bigtbucket btw.). I am also able to connect and add repositories within Databricks when adding the git configuration under "user settings > linked account". When running my workflow on Databricks (dbt) with my personal user (under "run as") everything works and dbt can make it's updates based on the config in the repository.
What I am seeking is on how to detach this from my personal user on Databricks. I want to run the workflow task as a service principal. Yet I am not able to set the git credentials for the service principal and it also cannot use the configured credentials that I added via the UI.
Therefore I have tried to deploy a "git credential" via terraform. This also allows me to checkout my repositories under "Workspace". But within the workflow it fails with above mentioned error message.
I hope this makes it more understandable!
03-06-2024 01:09 AM
Hi @Carsten03 ,
I'm facing the same issue at the moment. Have you been able to solve this issue?
Cheers,
03-06-2024 05:44 AM
Hey @MiroFuoli, unfortunately not 😞. I am using my personal user, and hope that at some point somebody has a solution. If you find anything, let me know!
06-11-2024 01:47 AM
Hi @Carsten03 , @MiroFuoli ,
I had a very similar issue and finally got it to work. Though this pretty much depends on your setup. I am using azure databricks and bitbucket, if you have a similar setup the following hopefully works for you.
curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token \
-d 'client_id=<client-id>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client-secret>'
You can find the <tenant-id>, <client-id> in the azure portal by checking the overview of your service principal. The <client-secret> is what you should have stored when in step 1.
curl -X POST "https://<databricks-host>.azuredatabricks.net/api/2.0/git-credentials" \
--header 'Authorization: Bearer <azure-service-principal-access-token' \
--data '{"personal_access_token" : "<repository-access-token>", "git_username" : "x-token-auth", "git_provider" : "bitbucketCloud" }' | jq .
Note that the value of "git_username" should be "x-token-auth". Apart from that, you simply use the credentials created in the previous two steps.
I hope that helps you.
Wish you the best!
06-11-2024 02:23 AM
nice, I will have to check that out! I gave up on this already.
06-18-2024 03:02 AM
We also have implemented it like this, and while it works, in my opinion it doesn't cleanly solve the problem of detaching a service principal from a user:
The OAuth client credentials flow exist for use cases like this and I hope it eventually it makes way to Databricks.
FWIW I have a discussion on this topic: https://community.databricks.com/t5/forums/v5/forumtopicpage.inlinemessagereplyeditor.form.form.form...
06-18-2024 03:04 AM
I created that link using the "Share" button in the post but it's broken, sorry 😂
Here's a working link to the discussion: https://community.databricks.com/t5/data-engineering/git-credentials-for-service-principals-running-...
yesterday
Databricks has updated documentation on authorizing a service principal to access Git folders.
Now Databricks has 3 different options to run the jobs by pointing to the Git code.
1. User PAT - Configure Git credentials & connect a remote repo to Databricks | Databricks on AWS
2. CI/CD - CI/CD with Databricks Git folders | Databricks on AWS
3. Authorizing a Service Principal to access Git folders - Authorize a service principal to access Git folders | Databricks on AWS
Option 3 is more reliable and maintainable for lightweight/less process-oriented jobs; option #2 for more robust, process-oriented jobs; and #1 for individual user-testing jobs.
10 hours ago
If goal is to run a job under service principal security context, deploy that job via Databricks Asset Bundles with property "run as" with service principal identifier. This is exactly how I configure most of my jobs, irrespective of being executed on all-purpose or jobs-compute clusters. I could provide detailed YAML files if needed 🙂
9 hours ago
On the other hand, another approach you could use. Configure your tasks with relative paths to notebooks and deploy all of them with DAB. Your job will reference directly the deployed notebook, no need to access GIT from jobs/notebooks. That is delegated to the CI/CD pipeline before cloning the branch you want to deploy. I hope this helps.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now