cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Deploying Databricks via ARM, then configuring via databricks-cli: creating a repo results in INVALID_STATE error.

m2chrisp
New Contributor II

Hi,

I'm using databricks-cli to configure a newly-deployed Azure Databricks instance. The ARM deployment works fine, and then the databricks cli commands to create a secrets scope and add users also works just fine.

Then I add a GitCredential to Databricks -- a username/PAT pair for Azure DevOps Repos access, and try to run `databricks repos create --url <url> --provider azureDevOpsServices --path /Repos/ParentDir/RepoName`.

This fails with the error:

Error: b'{"error_code":"INVALID_STATE","message":"Failed to clone repo. Repo may be incomplete. Failure reason: Operation failed: \\"Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\\", 403, PUT, https://*redacted*.dfs.core.windows.net/jobs/420349711802780/workspace-files/0680a075-b8be-4fa6-a3d7..., AuthenticationFailed, \\"Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:988c44fd-a01f-0023-5ba7-5d893f000000 Time:2023-03-23T16:48:33.5781637Z\\""}'

This looks like Databricks is unable to write the repo's content to the workspace storage account, which is really weird. Weird because, if I then log in to the instance, I can actually see the files from the repo under the Repos menu!

But, when clicking into the Repos directories -- although I can see the repo content, the branch button isn't present on the repo browser header. I can click the Git... menu option and successfully re-do a pull, and then the branch button appears.

I'm at a bit of a loss here as to what's happening -- any help appreciated.

Chris

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Chris Platts​ :

The error message you are seeing suggests that there is an issue with the authorization header when trying to clone the repository to the workspace storage account. This may be due to an issue with the GitCredential you added for Azure DevOps Repos access. Here are a few troubleshooting steps you can try:

  1. Check that the GitCredential you added is correct and has the necessary permissions to access the repository. You can verify this by running the
  2. git clone
  3. command locally using the same GitCredential and checking if it works.
  4. Check that the Databricks instance has been granted permission to access the Azure DevOps Repos. You can do this by checking the access policies on the repository in Azure DevOps.
  5. Check that the storage account associated with the workspace is correctly configured and accessible by the Databricks instance. You can do this by trying to write a file to the storage account from the Databricks instance using the Azure Storage SDK.

Hope this helps!

View solution in original post

3 REPLIES 3

Anonymous
Not applicable

@Chris Platts​ :

The error message you are seeing suggests that there is an issue with the authorization header when trying to clone the repository to the workspace storage account. This may be due to an issue with the GitCredential you added for Azure DevOps Repos access. Here are a few troubleshooting steps you can try:

  1. Check that the GitCredential you added is correct and has the necessary permissions to access the repository. You can verify this by running the
  2. git clone
  3. command locally using the same GitCredential and checking if it works.
  4. Check that the Databricks instance has been granted permission to access the Azure DevOps Repos. You can do this by checking the access policies on the repository in Azure DevOps.
  5. Check that the storage account associated with the workspace is correctly configured and accessible by the Databricks instance. You can do this by trying to write a file to the storage account from the Databricks instance using the Azure Storage SDK.

Hope this helps!

m2chrisp
New Contributor II

Thanks for the reply, suteja!

This is utterly bizarre, but it started working fine the next day, with no changes at all.

But to address your points, just for anyone else in a similar position to refer to:

  • The credentials were fine. Files and folders were being pulled from the repo. But clearly something was breaking due to the error and the fact that the 'branch' button was missing. Maybe the .git folder wasn't being created?
  • Yep, I ran a clone on my local machine using the token to prove it was correct. And the user who owned the token had the correct access to the repo.
  • Databricks seemed to be able to access its own workspace storage fine everywhere else. For example, my scripts which created folders within the instance's storage worked fine. And again, the repo was partially cloned.

I think maybe the fact that the accounts/principals used were 'brand-new' may have contributed to the error, perhaps? Given a few hours, it did eventually begin working. Perhaps Azure just needed to shake it out a bit.

Anonymous
Not applicable

Hi @Chris Platts​ 

Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!

Thanks and Regards

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.