cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to deploy to Databricks Assets Bundle from Azure DevOps using Service Principal?

PabloCSD
Contributor III

I have a CI/CD process that after a Pull Request (PR) to main it deploys to staging.

It works using a Personal Access Token using Azure Pipelines.

From local, deploying using Service Principal works (https://community.databricks.com/t5/administration-architecture/use-a-service-principal-token-instea...).

But I want to deploy from Azure Pipelines using the Service Principal. How can I do that?

If it helps, I have the local Azure Pipelines yaml.

    jobs:
      - job: onMainPullRequestJob
        workspace:
          clean: all
        steps:
          - task: UsePythonVersion@0
            displayName: Set up Python 3.10
            inputs:
              versionSpec: '3.10'

          - script:  curl -sSL https://install.python-poetry.org | python - --version 1.8.3
            displayName: Install Poetry

          - script: poetry config http-basic.$(ARTIFACT-FEED) $(USERNAME-FEED) $(System.AccessToken)
            displayName: Configure credentials to Feed

          - script: poetry install --with dev,test
            displayName: Install dependencies

          - script: poetry run pre-commit run --all-files
            displayName:  Run pre-commit check

          - script: poetry run pytest tests/unit -s -vvv
            displayName: Run unit tests

          - script: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
            displayName: Install Databricks CLI

          - script: |
              databricks bundle deploy --target staging
            env:
              DATABRICKS_HOST: $(DATABRICKS-HOST)
              DATABRICKS_TOKEN: $(DATABRICKS-TOKEN)
            displayName: Deploy the job

          - script: |
              databricks bundle run --target staging dab_job
            env:
              DATABRICKS_HOST: $(DATABRICKS-HOST)
              DATABRICKS_TOKEN: $(DATABRICKS-TOKEN)
            displayName: Launch worflow
1 ACCEPTED SOLUTION

Accepted Solutions

PabloCSD
Contributor III

I needed to deploy a job using CI/CD Azure Pipelines without using the OAuth, this is the way:

First you need to have configured the Service Principal, for that you need to generate it in your workspace with this you will have:

Now you need to do some pre-requisites:

  1. Have an Azure Pipeline with your project with its corresponding azure-pipeline.yaml for CI/CD
  2. Configure to the pipeline a variable for the HOST, the CLIENT_SECRET and the CLIENT_ID (in the below they are like these: YOUR-SERVICE-PRINCIPAL-CLIENT-ID, YOUR-SERVICE-PRINCIPAL-SECRET and YOUR-DATABRICKS-HOST).

Having this configured I show an azure-pipelines.yml file with the configurations I have:

 

 

 

pool: Azure Pipelines

trigger: none

pr:
  autoCancel: true
  branches:
    include:
      - main

stages:
  - stage: onDevPullRequest
    # Similar to onMainPullRequest
    ...

  - stage: onMainPullRequest
    # This stage is triggered when a PR is created into main
    # For instance, if you create a PR from feature/1.0.0 to main, this stage will be triggered
    # This stage will be skipped if the PR is created from release/* to main
    condition: |
      and(
        not(startsWith(variables['System.PullRequest.SourceBranch'], 'refs/heads/release')),
        startsWith(variables['System.PullRequest.TargetBranch'], 'refs/heads/main')
      )
    jobs:
      - job: onMainPullRequestJob
        workspace:
          clean: all
        steps:
          - task: UsePythonVersion@0
            displayName: Set up Python 3.10
            inputs:
              versionSpec: '3.10'

          - script:  curl -sSL https://install.python-poetry.org | python - --version 1.8.3
            displayName: Install Poetry

          - script: poetry config http-basic.$(ARTIFACT-FEED) $(USERNAME-FEED) $(System.AccessToken)
            displayName: Configure credential with the artifact feed

          - script: poetry install --with dev,test
            displayName: Install dependencies

          - script: poetry run pre-commit run --all-files
            displayName:  Run pre-commit check

          - script: poetry run pytest tests/unit -s -vvv
            displayName: Run unit tests

          - bash: |
              # Install Databricks CLI
              curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

              # Verify installation
              databricks --version

              # Create databricks config file
              file="~/.databrickscfg"

              # If exists, remove it
              if [ -f "$file" ] ; then
                  rm "$file"
              fi

              # Define the profile
              echo "[YOUR_WORKSPACE_PROFILE]" >> ~/.databrickscfg
              echo "host = $(YOUR-DATABRICKS-HOST)" >> ~/.databrickscfg
              echo "token = $(YOUR-SERVICE-PRINCIPAL-SECRET)" >> ~/.databrickscfg

              # Show the file
              cat ~/.databrickscfg

              # Export token
              export DATABRICKS_TOKEN_SP=$(curl --request POST \
                --url https://adb-XXXXXXXXXXXXXXXX.YY.azuredatabricks.net/oidc/v1/token \
                --user "$(YOUR-SERVICE-PRINCIPAL-CLIENT-ID):$(YOUR-SERVICE-PRINCIPAL-SECRET)" \
                --data 'grant_type=client_credentials&scope=all-apis' | jq -r '.access_token'
                )

              # Export for the usage of the next task
              echo "##vso[task.setvariable variable=DATABRICKS_TOKEN_SP]$DATABRICKS_TOKEN_SP"
            displayName: Install Databricks CLI and create config file

          - script: |
              databricks bundle deploy --target staging
            env:
              DATABRICKS_HOST: $(DATABRICKS-HOST)
              DATABRICKS_TOKEN: $(DATABRICKS_TOKEN_SP)
            displayName: Deploy the job

          - script: |
              databricks bundle run --target staging dab_your_workflow
            env:
              DATABRICKS_HOST: $(DATABRICKS-HOST)
              DATABRICKS_TOKEN: $(DATABRICKS_TOKEN_SP)
            displayName: Launch worflow

  - stage: onRelease
    # Similar to onMainPullRequest
    ...

 

 

 

Steps:

  1. Install the Databricks CLI
  2. Generate a Databricks Configuration File for the profile you are deploying: ~/.databrickscfg
  3. Generate a temporal token for deploying the job
  4. Export the variable for using it in the deploying task
  5. Deploy

With this way we assure to use the same environment variable "DATABRICKS-TOKEN-SP" in the next task. Also, we don't have to use OAuth for configuring the CI/CD which is a great step for user independent CI/CD processes.

If you need a template for the used databricks.yml here it is:

bundle:
  name: dab_your_workflow

# Declare to Databricks Assets Bundles that this is a Python project
# This is the interaction with the "pyproject.toml" file
artifacts:
  default:
    type: whl
    build: poetry build
    path: .

resources:
  jobs:
    dab_your_workflow:
      name: dab_your_workflow
      tasks:
        - task_key: your_workflow_task
          job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
          python_wheel_task:
             package_name: dab_your_workflow
             entry_point: your_workflow_entry_point
             parameters:
               - --conf-file
               - "/Workspace${workspace.root_path}/files/conf/tasks/your_workflow_task_config.yml"
               - --env
               - ${bundle.target}
          libraries:
            - whl: ./dist/*.whl

targets:
  dev:
    # Similar to Staging
    ...

  prod:
    # Similar to Staging
    ...

  staging:
    mode: production
    workspace:
      host: https://adb-XXXXXXXXXXXXXXXX.YY.azuredatabricks.net/
    run_as:
      service_principal_name: AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
    resources:
      jobs:
        dab_your_workflow:
          job_clusters:
            - job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
              new_cluster:
                num_workers: 2
                spark_version: "14.3.x-cpu-ml-scala2.12"  # Specify the Spark version
                spark_conf:
                  # You can specify the Spark configuration
                node_type_id: Standard_F8 # Specify the node type
                spark_env_vars:
                  # You can specify the Spark environment variables for example:
                  PIP_EXTRA_INDEX_URL:  "{{secrets/kv-your-key-vault/your-url-for-pip-extra-index-url}}"

I believe that there are ways to "improve this bonsai answer", so anyway to improve it, please comment it.

View solution in original post

1 REPLY 1

PabloCSD
Contributor III

I needed to deploy a job using CI/CD Azure Pipelines without using the OAuth, this is the way:

First you need to have configured the Service Principal, for that you need to generate it in your workspace with this you will have:

Now you need to do some pre-requisites:

  1. Have an Azure Pipeline with your project with its corresponding azure-pipeline.yaml for CI/CD
  2. Configure to the pipeline a variable for the HOST, the CLIENT_SECRET and the CLIENT_ID (in the below they are like these: YOUR-SERVICE-PRINCIPAL-CLIENT-ID, YOUR-SERVICE-PRINCIPAL-SECRET and YOUR-DATABRICKS-HOST).

Having this configured I show an azure-pipelines.yml file with the configurations I have:

 

 

 

pool: Azure Pipelines

trigger: none

pr:
  autoCancel: true
  branches:
    include:
      - main

stages:
  - stage: onDevPullRequest
    # Similar to onMainPullRequest
    ...

  - stage: onMainPullRequest
    # This stage is triggered when a PR is created into main
    # For instance, if you create a PR from feature/1.0.0 to main, this stage will be triggered
    # This stage will be skipped if the PR is created from release/* to main
    condition: |
      and(
        not(startsWith(variables['System.PullRequest.SourceBranch'], 'refs/heads/release')),
        startsWith(variables['System.PullRequest.TargetBranch'], 'refs/heads/main')
      )
    jobs:
      - job: onMainPullRequestJob
        workspace:
          clean: all
        steps:
          - task: UsePythonVersion@0
            displayName: Set up Python 3.10
            inputs:
              versionSpec: '3.10'

          - script:  curl -sSL https://install.python-poetry.org | python - --version 1.8.3
            displayName: Install Poetry

          - script: poetry config http-basic.$(ARTIFACT-FEED) $(USERNAME-FEED) $(System.AccessToken)
            displayName: Configure credential with the artifact feed

          - script: poetry install --with dev,test
            displayName: Install dependencies

          - script: poetry run pre-commit run --all-files
            displayName:  Run pre-commit check

          - script: poetry run pytest tests/unit -s -vvv
            displayName: Run unit tests

          - bash: |
              # Install Databricks CLI
              curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

              # Verify installation
              databricks --version

              # Create databricks config file
              file="~/.databrickscfg"

              # If exists, remove it
              if [ -f "$file" ] ; then
                  rm "$file"
              fi

              # Define the profile
              echo "[YOUR_WORKSPACE_PROFILE]" >> ~/.databrickscfg
              echo "host = $(YOUR-DATABRICKS-HOST)" >> ~/.databrickscfg
              echo "token = $(YOUR-SERVICE-PRINCIPAL-SECRET)" >> ~/.databrickscfg

              # Show the file
              cat ~/.databrickscfg

              # Export token
              export DATABRICKS_TOKEN_SP=$(curl --request POST \
                --url https://adb-XXXXXXXXXXXXXXXX.YY.azuredatabricks.net/oidc/v1/token \
                --user "$(YOUR-SERVICE-PRINCIPAL-CLIENT-ID):$(YOUR-SERVICE-PRINCIPAL-SECRET)" \
                --data 'grant_type=client_credentials&scope=all-apis' | jq -r '.access_token'
                )

              # Export for the usage of the next task
              echo "##vso[task.setvariable variable=DATABRICKS_TOKEN_SP]$DATABRICKS_TOKEN_SP"
            displayName: Install Databricks CLI and create config file

          - script: |
              databricks bundle deploy --target staging
            env:
              DATABRICKS_HOST: $(DATABRICKS-HOST)
              DATABRICKS_TOKEN: $(DATABRICKS_TOKEN_SP)
            displayName: Deploy the job

          - script: |
              databricks bundle run --target staging dab_your_workflow
            env:
              DATABRICKS_HOST: $(DATABRICKS-HOST)
              DATABRICKS_TOKEN: $(DATABRICKS_TOKEN_SP)
            displayName: Launch worflow

  - stage: onRelease
    # Similar to onMainPullRequest
    ...

 

 

 

Steps:

  1. Install the Databricks CLI
  2. Generate a Databricks Configuration File for the profile you are deploying: ~/.databrickscfg
  3. Generate a temporal token for deploying the job
  4. Export the variable for using it in the deploying task
  5. Deploy

With this way we assure to use the same environment variable "DATABRICKS-TOKEN-SP" in the next task. Also, we don't have to use OAuth for configuring the CI/CD which is a great step for user independent CI/CD processes.

If you need a template for the used databricks.yml here it is:

bundle:
  name: dab_your_workflow

# Declare to Databricks Assets Bundles that this is a Python project
# This is the interaction with the "pyproject.toml" file
artifacts:
  default:
    type: whl
    build: poetry build
    path: .

resources:
  jobs:
    dab_your_workflow:
      name: dab_your_workflow
      tasks:
        - task_key: your_workflow_task
          job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
          python_wheel_task:
             package_name: dab_your_workflow
             entry_point: your_workflow_entry_point
             parameters:
               - --conf-file
               - "/Workspace${workspace.root_path}/files/conf/tasks/your_workflow_task_config.yml"
               - --env
               - ${bundle.target}
          libraries:
            - whl: ./dist/*.whl

targets:
  dev:
    # Similar to Staging
    ...

  prod:
    # Similar to Staging
    ...

  staging:
    mode: production
    workspace:
      host: https://adb-XXXXXXXXXXXXXXXX.YY.azuredatabricks.net/
    run_as:
      service_principal_name: AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
    resources:
      jobs:
        dab_your_workflow:
          job_clusters:
            - job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
              new_cluster:
                num_workers: 2
                spark_version: "14.3.x-cpu-ml-scala2.12"  # Specify the Spark version
                spark_conf:
                  # You can specify the Spark configuration
                node_type_id: Standard_F8 # Specify the node type
                spark_env_vars:
                  # You can specify the Spark environment variables for example:
                  PIP_EXTRA_INDEX_URL:  "{{secrets/kv-your-key-vault/your-url-for-pip-extra-index-url}}"

I believe that there are ways to "improve this bonsai answer", so anyway to improve it, please comment it.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group