cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can I use Git provider with using Service Principal in job

Yuki
New Contributor II

Hi everyone,

I'm trying to use Git provider in Databricks job.

First, I was using my personal user account to `Run as`.

But when I change `Run as` to Service Principal, it was failed because of permission error.

And I can't find a way to solve it.

Could I achieve this settings?

Yuki_0-1699340000007.png

 

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @Yuki, Certainly! When using a Service Principal to run Databricks jobs and encountering permission errors with the Git provider, here are some steps you can take to troubleshoot and resolve the issue:

 

Confirm Git Integration Settings:

  • Go to User Settings > Linked accounts in Databricks.
  • Ensure that you have selected the correct Git provider (e.g., GitHub, GitLab, Bitbucket).
  • Enter both your Git provider username and personal access token (PAT).
  • Note that legacy Git integrations did not require a username, so you might need to add one for Databricks Repos.

Verify Repo Access:

  • Make sure your personal access token or app password has the correct access to the repository.
  • If your Git provider uses Single Sign-On (SSO), authorize your tokens for SSO.

Test with Git Command Line:

  • Use the Git command line to test your token:git clone https://<username>:<personal-access-token>@github.com/<org>/<repo-name>.git

Secure Connection (SSL) Problems:

  • If you encounter SSL problems, ensure that your Git server is accessible from Azure Databricks.

Timeout Errors:

  • Expensive operations (e.g., cloning large repos) might result in timeout errors. These operations could complete in the background.
  • Consider using sparse checkout for large repos.

404 Errors:

  • If you receive a 404 error when opening a non-notebook file, wait a few minutes and try again. There can be a delay between workspace enabling and webapp configuration updates.

Resolve Notebook Name Conflicts:

  • Conflicting notebook names (similar or identical filenames) can cause issues when creating a repo or pull request.
  • Ensure that folders do not contain notebooks with the same name as other notebooks, files, or folders (excluding file extensions).

Remember to adjust these settings based on your specific Git provider and repository configuration. With these steps, you should be able to successfully use the Git provider in your Databricks jobs! ๐Ÿš€๐Ÿ”ง

martindlarsson
New Contributor III

@Kaniz_Fatma you mentioned doing this using a service principal in the head of your answer and then no instructions on how to do just that! How are one supposed to got to User Settings Linked accounts as a service principal?

martindlarsson
New Contributor III

The documentation is lacking in this area which should be easy to set up. Instead we are forced to search among community topics such as these.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group