cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Updating databricks git repo from github action - how to

NielsMH
New Contributor III

Hi

My company is migrating from azuredevops to github and we have a pipeline in azuredevops which updates/syncs databricks repos whenever a pull request is made to the development branch. The azure devops pipeline (which works) looks like this:

 

trigger:
- development

pool:
vmImage: ubuntu-latest

steps:
- task: AzureKeyVault@2
inputs:
azureSubscription: 'Azure Service Connection'
KeyVaultName: $(keyvault_name)
SecretsFilter: '*'
RunAsPreJob: true
- task: Bash@3
inputs:
targetType: 'inline'
script: |
echo "Setup Databricks environmental variables (to be able to autoconfig databricks-cli)"
export DATABRICKS_HOST=$(DATABRICKSHOST)
export DATABRICKS_TOKEN=$(DATABRICKSTOKEN)
echo ${#DATABRICKS_HOST}
echo Install Databricks CLI
pip install databricks-cli
echo Update Repo
databricks repos update --repo-id $(repo_id) --branch $(branch)


I have rewritten it to fit a github action (i basically omitted fetching secrets from azure keyvault, and call them directly from the github secret env instead) like this:

 

name: Sync to Databricks Repo

on:
push:
branches:
- development
workflow_dispatch:

jobs:
sync-to-databricks:
runs-on: ubuntu-latest

env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}

steps:
- name: Check out repository
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Install and Configure Databricks CLI
run: |
echo "Setup Databricks environmental variables"
echo "Host length: ${#DATABRICKS_HOST}"

echo "Installing Databricks CLI"
pip install databricks-cli

echo "Updating Repo to development branch"
databricks repos update --repo-id ${{ secrets.REPO_ID }} --branch ${{ secrets.BRANCH }}
However, i run into authorization issues. The repo i try to update is in a Standard Databricks Workspace (not Premium). The git provider i use is "Github" and the connection i have set up to the github repository is via a Github Access Token with all relevant permissions (repo, admin, etc). The DATABRICKS_TOKEN i refer to is a Databricks Token i have created for my workspace admin user. I have 2xchecked all secrets, so i cant for the life of me not figure out why it goes wrong.  

I can run "databricks repos list" with the same credentials from the terminal without problems, but if i run "databricks repos update 12345xxx --branch development" i receive "Error: Missing Git provider credentials. Go to User Settings > Git Integration to set up your Git credentials." even though i can pull etc. from the databricks repo UI with the same tokens. I get same error when i try to use the databricks API with curl.

Can anyone help me out here?  

1 REPLY 1

Walter_C
Databricks Employee
Databricks Employee

It seems like the issue you're encountering is related to missing Git provider credentials when trying to update the Databricks repo via GitHub Actions. Based on the context provided, here are a few steps you can take to resolve this issue:

 
  1. Verify Git Integration in Databricks: Ensure that your GitHub account is properly linked to your Databricks account. You can do this by going to User Settings > Git Integration in Databricks and making sure your GitHub credentials are correctly set up. This step is crucial as the error message indicates missing Git provider credentials.

  2. Use Databricks GitHub App: Instead of using a Personal Access Token (PAT), consider using the Databricks GitHub App for integration. This app handles token renewal automatically and provides more granular control over access. You can install and configure the Databricks GitHub App by following the instructions in the Databricks documentation.

  3. Check Token Permissions: Double-check the permissions of the GitHub Access Token you are using. Ensure it has the necessary scopes such as repo, admin:repo_hook, and any other relevant permissions required for updating the repository.

  4. Environment Variables in GitHub Actions: Ensure that the environment variables in your GitHub Actions workflow are correctly set. The DATABRICKS_HOST, DATABRICKS_TOKEN, REPO_ID, and BRANCH should be correctly referenced from the GitHub secrets.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group