10-14-2024 05:55 AM
I have a CI/CD process that after a Pull Request (PR) to main it deploys to staging.
It works using a Personal Access Token using Azure Pipelines.
From local, deploying using Service Principal works (https://community.databricks.com/t5/administration-architecture/use-a-service-principal-token-instea...).
But I want to deploy from Azure Pipelines using the Service Principal. How can I do that?
If it helps, I have the local Azure Pipelines yaml.
jobs:
- job: onMainPullRequestJob
workspace:
clean: all
steps:
- task: UsePythonVersion@0
displayName: Set up Python 3.10
inputs:
versionSpec: '3.10'
- script: curl -sSL https://install.python-poetry.org | python - --version 1.8.3
displayName: Install Poetry
- script: poetry config http-basic.$(ARTIFACT-FEED) $(USERNAME-FEED) $(System.AccessToken)
displayName: Configure credentials to Feed
- script: poetry install --with dev,test
displayName: Install dependencies
- script: poetry run pre-commit run --all-files
displayName: Run pre-commit check
- script: poetry run pytest tests/unit -s -vvv
displayName: Run unit tests
- script: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
displayName: Install Databricks CLI
- script: |
databricks bundle deploy --target staging
env:
DATABRICKS_HOST: $(DATABRICKS-HOST)
DATABRICKS_TOKEN: $(DATABRICKS-TOKEN)
displayName: Deploy the job
- script: |
databricks bundle run --target staging dab_job
env:
DATABRICKS_HOST: $(DATABRICKS-HOST)
DATABRICKS_TOKEN: $(DATABRICKS-TOKEN)
displayName: Launch worflow
10-15-2024 02:02 PM - edited 10-15-2024 02:08 PM
I needed to deploy a job using CI/CD Azure Pipelines without using the OAuth, this is the way:
First you need to have configured the Service Principal, for that you need to generate it in your workspace with this you will have:
Now you need to do some pre-requisites:
Having this configured I show an azure-pipelines.yml file with the configurations I have:
pool: Azure Pipelines
trigger: none
pr:
autoCancel: true
branches:
include:
- main
stages:
- stage: onDevPullRequest
# Similar to onMainPullRequest
...
- stage: onMainPullRequest
# This stage is triggered when a PR is created into main
# For instance, if you create a PR from feature/1.0.0 to main, this stage will be triggered
# This stage will be skipped if the PR is created from release/* to main
condition: |
and(
not(startsWith(variables['System.PullRequest.SourceBranch'], 'refs/heads/release')),
startsWith(variables['System.PullRequest.TargetBranch'], 'refs/heads/main')
)
jobs:
- job: onMainPullRequestJob
workspace:
clean: all
steps:
- task: UsePythonVersion@0
displayName: Set up Python 3.10
inputs:
versionSpec: '3.10'
- script: curl -sSL https://install.python-poetry.org | python - --version 1.8.3
displayName: Install Poetry
- script: poetry config http-basic.$(ARTIFACT-FEED) $(USERNAME-FEED) $(System.AccessToken)
displayName: Configure credential with the artifact feed
- script: poetry install --with dev,test
displayName: Install dependencies
- script: poetry run pre-commit run --all-files
displayName: Run pre-commit check
- script: poetry run pytest tests/unit -s -vvv
displayName: Run unit tests
- bash: |
# Install Databricks CLI
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
# Verify installation
databricks --version
# Create databricks config file
file="~/.databrickscfg"
# If exists, remove it
if [ -f "$file" ] ; then
rm "$file"
fi
# Define the profile
echo "[YOUR_WORKSPACE_PROFILE]" >> ~/.databrickscfg
echo "host = $(YOUR-DATABRICKS-HOST)" >> ~/.databrickscfg
echo "token = $(YOUR-SERVICE-PRINCIPAL-SECRET)" >> ~/.databrickscfg
# Show the file
cat ~/.databrickscfg
# Export token
export DATABRICKS_TOKEN_SP=$(curl --request POST \
--url https://adb-XXXXXXXXXXXXXXXX.YY.azuredatabricks.net/oidc/v1/token \
--user "$(YOUR-SERVICE-PRINCIPAL-CLIENT-ID):$(YOUR-SERVICE-PRINCIPAL-SECRET)" \
--data 'grant_type=client_credentials&scope=all-apis' | jq -r '.access_token'
)
# Export for the usage of the next task
echo "##vso[task.setvariable variable=DATABRICKS_TOKEN_SP]$DATABRICKS_TOKEN_SP"
displayName: Install Databricks CLI and create config file
- script: |
databricks bundle deploy --target staging
env:
DATABRICKS_HOST: $(DATABRICKS-HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN_SP)
displayName: Deploy the job
- script: |
databricks bundle run --target staging dab_your_workflow
env:
DATABRICKS_HOST: $(DATABRICKS-HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN_SP)
displayName: Launch worflow
- stage: onRelease
# Similar to onMainPullRequest
...
Steps:
With this way we assure to use the same environment variable "DATABRICKS-TOKEN-SP" in the next task. Also, we don't have to use OAuth for configuring the CI/CD which is a great step for user independent CI/CD processes.
If you need a template for the used databricks.yml here it is:
bundle:
name: dab_your_workflow
# Declare to Databricks Assets Bundles that this is a Python project
# This is the interaction with the "pyproject.toml" file
artifacts:
default:
type: whl
build: poetry build
path: .
resources:
jobs:
dab_your_workflow:
name: dab_your_workflow
tasks:
- task_key: your_workflow_task
job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
python_wheel_task:
package_name: dab_your_workflow
entry_point: your_workflow_entry_point
parameters:
- --conf-file
- "/Workspace${workspace.root_path}/files/conf/tasks/your_workflow_task_config.yml"
- --env
- ${bundle.target}
libraries:
- whl: ./dist/*.whl
targets:
dev:
# Similar to Staging
...
prod:
# Similar to Staging
...
staging:
mode: production
workspace:
host: https://adb-XXXXXXXXXXXXXXXX.YY.azuredatabricks.net/
run_as:
service_principal_name: AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
resources:
jobs:
dab_your_workflow:
job_clusters:
- job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
new_cluster:
num_workers: 2
spark_version: "14.3.x-cpu-ml-scala2.12" # Specify the Spark version
spark_conf:
# You can specify the Spark configuration
node_type_id: Standard_F8 # Specify the node type
spark_env_vars:
# You can specify the Spark environment variables for example:
PIP_EXTRA_INDEX_URL: "{{secrets/kv-your-key-vault/your-url-for-pip-extra-index-url}}"
I believe that there are ways to "improve this bonsai answer", so anyway to improve it, please comment it.
10-15-2024 02:02 PM - edited 10-15-2024 02:08 PM
I needed to deploy a job using CI/CD Azure Pipelines without using the OAuth, this is the way:
First you need to have configured the Service Principal, for that you need to generate it in your workspace with this you will have:
Now you need to do some pre-requisites:
Having this configured I show an azure-pipelines.yml file with the configurations I have:
pool: Azure Pipelines
trigger: none
pr:
autoCancel: true
branches:
include:
- main
stages:
- stage: onDevPullRequest
# Similar to onMainPullRequest
...
- stage: onMainPullRequest
# This stage is triggered when a PR is created into main
# For instance, if you create a PR from feature/1.0.0 to main, this stage will be triggered
# This stage will be skipped if the PR is created from release/* to main
condition: |
and(
not(startsWith(variables['System.PullRequest.SourceBranch'], 'refs/heads/release')),
startsWith(variables['System.PullRequest.TargetBranch'], 'refs/heads/main')
)
jobs:
- job: onMainPullRequestJob
workspace:
clean: all
steps:
- task: UsePythonVersion@0
displayName: Set up Python 3.10
inputs:
versionSpec: '3.10'
- script: curl -sSL https://install.python-poetry.org | python - --version 1.8.3
displayName: Install Poetry
- script: poetry config http-basic.$(ARTIFACT-FEED) $(USERNAME-FEED) $(System.AccessToken)
displayName: Configure credential with the artifact feed
- script: poetry install --with dev,test
displayName: Install dependencies
- script: poetry run pre-commit run --all-files
displayName: Run pre-commit check
- script: poetry run pytest tests/unit -s -vvv
displayName: Run unit tests
- bash: |
# Install Databricks CLI
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
# Verify installation
databricks --version
# Create databricks config file
file="~/.databrickscfg"
# If exists, remove it
if [ -f "$file" ] ; then
rm "$file"
fi
# Define the profile
echo "[YOUR_WORKSPACE_PROFILE]" >> ~/.databrickscfg
echo "host = $(YOUR-DATABRICKS-HOST)" >> ~/.databrickscfg
echo "token = $(YOUR-SERVICE-PRINCIPAL-SECRET)" >> ~/.databrickscfg
# Show the file
cat ~/.databrickscfg
# Export token
export DATABRICKS_TOKEN_SP=$(curl --request POST \
--url https://adb-XXXXXXXXXXXXXXXX.YY.azuredatabricks.net/oidc/v1/token \
--user "$(YOUR-SERVICE-PRINCIPAL-CLIENT-ID):$(YOUR-SERVICE-PRINCIPAL-SECRET)" \
--data 'grant_type=client_credentials&scope=all-apis' | jq -r '.access_token'
)
# Export for the usage of the next task
echo "##vso[task.setvariable variable=DATABRICKS_TOKEN_SP]$DATABRICKS_TOKEN_SP"
displayName: Install Databricks CLI and create config file
- script: |
databricks bundle deploy --target staging
env:
DATABRICKS_HOST: $(DATABRICKS-HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN_SP)
displayName: Deploy the job
- script: |
databricks bundle run --target staging dab_your_workflow
env:
DATABRICKS_HOST: $(DATABRICKS-HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN_SP)
displayName: Launch worflow
- stage: onRelease
# Similar to onMainPullRequest
...
Steps:
With this way we assure to use the same environment variable "DATABRICKS-TOKEN-SP" in the next task. Also, we don't have to use OAuth for configuring the CI/CD which is a great step for user independent CI/CD processes.
If you need a template for the used databricks.yml here it is:
bundle:
name: dab_your_workflow
# Declare to Databricks Assets Bundles that this is a Python project
# This is the interaction with the "pyproject.toml" file
artifacts:
default:
type: whl
build: poetry build
path: .
resources:
jobs:
dab_your_workflow:
name: dab_your_workflow
tasks:
- task_key: your_workflow_task
job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
python_wheel_task:
package_name: dab_your_workflow
entry_point: your_workflow_entry_point
parameters:
- --conf-file
- "/Workspace${workspace.root_path}/files/conf/tasks/your_workflow_task_config.yml"
- --env
- ${bundle.target}
libraries:
- whl: ./dist/*.whl
targets:
dev:
# Similar to Staging
...
prod:
# Similar to Staging
...
staging:
mode: production
workspace:
host: https://adb-XXXXXXXXXXXXXXXX.YY.azuredatabricks.net/
run_as:
service_principal_name: AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE
resources:
jobs:
dab_your_workflow:
job_clusters:
- job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
new_cluster:
num_workers: 2
spark_version: "14.3.x-cpu-ml-scala2.12" # Specify the Spark version
spark_conf:
# You can specify the Spark configuration
node_type_id: Standard_F8 # Specify the node type
spark_env_vars:
# You can specify the Spark environment variables for example:
PIP_EXTRA_INDEX_URL: "{{secrets/kv-your-key-vault/your-url-for-pip-extra-index-url}}"
I believe that there are ways to "improve this bonsai answer", so anyway to improve it, please comment it.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group