cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

Run workflow using git integration with service principal

Carsten03
New Contributor III

Hi,

I want to run a dbt workflow task and would like to use the git integration for that. Using my personal user I am able to do so but I am running my workflows using a service principal.

I added git credentials and the repository using terraform. I am able to use the repositories in my workspace but I when I trigger the dbt job I get this message:

 

[RepositoryCheckoutFailed] Failed to checkout Git repository: 
PERMISSION_DENIED: Missing Git provider credentials.
Go to User Settings > Git Integration to set up your Git credentials

I want to run the workflow with a service principal as an owner and therefore don't want to set the git credentials for my personal user. How can the service principal get access to the git credential / repository? I have checked via the cli that the credential is available on my workspace. I couldn't find any documentation on how to grant permissions to a credential.

Note that my deployment principal is not the same as the principal that runs my pipelines.

I hope somebody is able to help me with this.

4 REPLIES 4

Simranarora
New Contributor III
New Contributor III

To set up Git credentials for a Service Principal to access notebooks in Repos (GitHub) without any dependencies on a personal GitHub account, you can follow these steps:

  1. Create a Service Principal in Azure Active Directory (Azure AD) if you haven't already. This will be used to authenticate with Azure services.

  2. Assign the necessary permissions to the Service Principal. You will need to grant it appropriate permissions to access the GitHub repository where the notebooks are stored. This can be done by adding the Service Principal to the repository with the required access level (e.g., read, write, or admin).

  3. Generate a Personal Access Token (PAT) in the GitHub repository. This token will serve as the credentials for the Service Principal to authenticate with GitHub. Go to your GitHub repository's settings, navigate to the "Developer settings" or "Personal access tokens" section, and generate a new token. Make sure to grant it the necessary scopes and permissions to access the repository and perform the required actions.

  4. Store the generated PAT securely. Treat the PAT like a password and ensure it is stored securely. It's recommended to use a secure key vault or secret management system provided by your cloud provider to store the PAT securely.

  5. Configure Git to use the Service Principal and the PAT. On the machine or environment where the jobs will run, set up Git to use the Service Principal's credentials. Run the following commands in a terminal or command prompt:

git config --global credential.username <Service Principal Client ID>
git config --global credential.helper "!f() { echo username=$GIT_USERNAME; echo password=$GIT_PASSWORD; }; f"

7. Replace <Service Principal Client ID> with the actual Client ID of your Service Principal. GIT_USERNAME should be set to the Service Principal's Client ID, and GIT_PASSWORD should be set to the PAT generated in step 3

8. Test the Git configuration. To verify that the Git credentials are set up correctly, you can try cloning or pulling the repository using Git commands. For example:

git clone <repository_url>

 If the credentials are correctly configured, the repository should be cloned without asking for any additional authentication.

By following these steps, you can set up Git credentials for a Service Principal to access notebooks in Repos (GitHub) without relying on a personal GitHub account.

 
Best Regards,
Simran

Carsten03
New Contributor III

Hi @Simranarora ,

thank you for your answer. I am not sure if I have expressed myself poorly but this is not fixing the issue I actually have. I have already made the connection via git credentials using a technical user on the git provider side (I use Bigtbucket btw.). I am also able to connect and add repositories within Databricks when adding the git configuration under "user settings > linked account". When running my workflow on Databricks (dbt) with my personal user (under "run as") everything works and dbt can make it's updates based on the config in the repository.

What I am seeking is on how to detach this from my personal user on Databricks. I want to run the workflow task as a service principal. Yet I am not able to set the git credentials for the service principal and it also cannot use the configured credentials that I added via the UI.

Therefore I have tried to deploy a "git credential" via terraform. This also allows me to checkout my repositories under "Workspace". But within the workflow it fails with above mentioned error message.

I hope this makes it more understandable!

MiroFuoli
New Contributor II

Hi @Carsten03 ,

I'm facing the same issue at the moment. Have you been able to solve this issue?

Cheers,

Carsten03
New Contributor III

Hey @MiroFuoli, unfortunately not 😞. I am using my personal user, and hope that at some point somebody has a solution. If you find anything, let me know! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.