cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
cancel
Showing results for 
Search instead for 
Did you mean: 

source set to GIT for Databricks Asset Bundle notebook_task - git authentication fails on run

Еmil
New Contributor III

My post was marked as Spam after trying to post the description of my issue so now I have posted the question on stackoverflow.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Еmil, I've read through your question and believe I have a solution for you.

Here's a response to your question:

  • Since your job runs as a service principal, consider using OAuth M2M authentication for accessing your Azure DevOps Git repository.
  • Ensure that the service principal has the necessary permissions to read from the Git repo.
  • You can obtain an access token for the service principal and use it in your Databricks job to authenticate with Azure DevOps.
  • As you mentioned, there isn’t a direct way to specify a git credential ID in the asset bundle YAML.
  • However, you can programmatically create a new Git credential using the Databricks REST API.
  • Here’s a high-level approach:
    • Obtain an access token for the service principal.
    • Use the Databricks REST API to create a new Git credential with the access token.
    • In your asset bundle YAML, set the Git source to use this newly created credential.
    • Although not ideal, you can store the Git credentials (such as username and personal access token) as secrets in your Databricks workspace.
    • Then, reference these secrets in your asset bundle YAML.
    • While this approach works, it’s less secure because secrets are accessible within the workspace.
    • Please provide feedback to us about the need for more explicit Git credential management in asset bundles.

If you have any further questions or need clarification, feel free to ask.

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @ЕmilI'm sorry for any inconvenience caused by your post being marked as spam. While you've posted your question on Stack Overflow, we would also encourage you to post it on the Databricks community forum. This will help ensure that your question reaches a wider audience and increases the chances of receiving a helpful response. Thank you for your understanding.

Kaniz
Community Manager
Community Manager

Hi @Еmil, I've read through your question and believe I have a solution for you.

Here's a response to your question:

  • Since your job runs as a service principal, consider using OAuth M2M authentication for accessing your Azure DevOps Git repository.
  • Ensure that the service principal has the necessary permissions to read from the Git repo.
  • You can obtain an access token for the service principal and use it in your Databricks job to authenticate with Azure DevOps.
  • As you mentioned, there isn’t a direct way to specify a git credential ID in the asset bundle YAML.
  • However, you can programmatically create a new Git credential using the Databricks REST API.
  • Here’s a high-level approach:
    • Obtain an access token for the service principal.
    • Use the Databricks REST API to create a new Git credential with the access token.
    • In your asset bundle YAML, set the Git source to use this newly created credential.
    • Although not ideal, you can store the Git credentials (such as username and personal access token) as secrets in your Databricks workspace.
    • Then, reference these secrets in your asset bundle YAML.
    • While this approach works, it’s less secure because secrets are accessible within the workspace.
    • Please provide feedback to us about the need for more explicit Git credential management in asset bundles.

If you have any further questions or need clarification, feel free to ask.

Еmil
New Contributor III

Hi @Kaniz,

thank you for sorting out my "spam" and mainly for your answer.

I kind of reached to the same high-level approach you are outlining, but I did not like the idea of storing secrets in the workspace and I ended up using WORKSPACE as a source where my deployment workflow is as follows:

  • obtain access token for the service principal
  • check if existing git credential exists (I get an error - only one git credential is allowed) and delete it if it does, then create a new git credential
  • I use the git credential to clone a git repo into a new/existing databricks repo (check and create it if does not exist) using a specific branch (branch name is an Azure DevOps pipeline parameter)
  • I do asset `bundle deploy ...` where the soource in the YAML is set to WORKSPACE pointing to the code in the updated databricks repo in previous step

It would be nice to find out if improvements to around this issue is on your roadmap?
For example at the moment I found it impossible to create a workflow job using the databricks UI and set it's source to Git repo - it is not possible...

Thanks,

Emil

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.