09-24-2024 02:11 PM
How can I connect using a Service Principal Token, I did this, but it is not a PAT:
databricks configure
Databricks host: https:// ...
Personal access token: ****
I also tried this, but didn't work either:
[profile]
host = <workspace-url>
client_id = <service-principal-client-id>
client_secret = <service-principal-secret>
I tried this way, but nothing (just in case):
databricks configure --aad-token
How can I configure my Databricks workspace so I can deploy DAB in that workspace, but using the service-principal token (for no relying in PAT's)
Best regards, and thank you
#DAB #DatabricksAssetsBundle #ServicePrincipal
09-26-2024 07:03 AM
Thanks Pedro, we did it, for anyone in the future (I added fake host and service principal id's):
1. Modify your databricks.yml so it have the service principal id and the databricks host:
bundle:
name: my_workflow
# Declare to Databricks Assets Bundles that this is a Python project
# This is the interaction with the "pyproject.toml" file
artifacts:
default:
type: whl
build: poetry build
path: .
resources:
jobs:
my_workflow:
name: my_workflow
job_clusters:
- job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
new_cluster:
num_workers: 2
spark_version: "15.3.x-cpu-ml-scala2.12"
node_type_id: Standard_DS3_v2
tasks:
- task_key: my_workflow_pipeline_task
job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
python_wheel_task:
package_name: my_workflow
entry_point: my_workflow_pipeline_task
libraries:
- whl: ./dist/*.whl
permissions:
# If you are using a group, you need to create it in the Databricks workspace
- group_name: "my_group_name"
level: "CAN_MANAGE"
targets:
dev:
mode: development
default: true
workspace:
# Put here the associated workspace url
host: https://adb-0000000000000000.7.azuredatabricks.net
run_as:
# Put here the associated service_principal_name
service_principal_name: 76w4hdge-39a2-0303-45c7-udnr93kvp03f
resources:
jobs:
my_workflow:
job_clusters:
- job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
new_cluster:
num_workers: 2
spark_version: "15.3.x-cpu-ml-scala2.12"
node_type_id: Standard_DS3_v2
permissions:
# If you are using a group, you need to create it in the Databricks workspace
- group_name: "my_group_name"
level: "CAN_MANAGE"
2. Create a .databrickscfg file in the same route where your databricks-cli is installed, so it has the following information:
[my_workflow]
host=https://adb-0000000000000000.7.azuredatabricks.net/
client_id = 76w4hdge-39a2-0303-45c7-udnr93kvp03f
client_secret = tomatoes***************spinach
3. In the terminal just run:
databricks bundle deploy --profile my_workflow
If all was done correctly this should be the output:
(.venv) oishiiramen@3301 my_directory % databricks bundle deploy --profile my_workflow
Building default...
Uploading my_workflow-0.1.1-py3-none-any.whl...
Uploading bundle files to /Users/76w4hdge-39a2-0303-45c7-udnr93kvp03f/.bundle/my_workflow/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
If the .databrickscfg was not created this could appear:
(.venv) oishiiramen@3301 my_directory % databricks bundle deploy --profile my_workflow
Error: cannot resolve bundle auth configuration: cannot parse config file: open /Users/oishiiramen/.databrickscfg: no such file or directory
09-24-2024 09:29 PM
Hi @PabloCSD,
Long story short you can watch this video where I go step by step on how to set up service principal in azure, grant permissions to workspace and generate a token to itself by doing a machine to machine authentication in the Databricks CLI.
The steps that you need to take to deploy your bundle using service principle.
1.Add service principal to your Databricks Account
2.Give that service principal administration rights to the workspace you want to deploy the DAB
3. Generate a PAT (personal access token) to the service principal. Which you can do in 2 ways.
3.a Either via a Machine to Machine authentication where the service principal generate a PAT to itself. I demonstrate this in the video
3,b or you can generate a token to the sp by using the "on behalf" option providing the principal generating the token has at least the workspace administration writes. On this post there is a solution for this option.
To deploy your bundle using the cli you will use the command
databricks bundle deploy -t <target-name> -p <sp-profile>
The service principle profile needs to have your service principal configured on your ~/.databrickscfg file either with a machine to machine (oauth token) or pat (personal access token)
Hope this helps.
Let me know if you can solve your issue. If any other questions I am here to help
Regards
Pedro
09-24-2024 10:45 PM
Just adding the documentation about authentication -> https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/authentication#m2m-auth
09-26-2024 07:03 AM
Thanks Pedro, we did it, for anyone in the future (I added fake host and service principal id's):
1. Modify your databricks.yml so it have the service principal id and the databricks host:
bundle:
name: my_workflow
# Declare to Databricks Assets Bundles that this is a Python project
# This is the interaction with the "pyproject.toml" file
artifacts:
default:
type: whl
build: poetry build
path: .
resources:
jobs:
my_workflow:
name: my_workflow
job_clusters:
- job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
new_cluster:
num_workers: 2
spark_version: "15.3.x-cpu-ml-scala2.12"
node_type_id: Standard_DS3_v2
tasks:
- task_key: my_workflow_pipeline_task
job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
python_wheel_task:
package_name: my_workflow
entry_point: my_workflow_pipeline_task
libraries:
- whl: ./dist/*.whl
permissions:
# If you are using a group, you need to create it in the Databricks workspace
- group_name: "my_group_name"
level: "CAN_MANAGE"
targets:
dev:
mode: development
default: true
workspace:
# Put here the associated workspace url
host: https://adb-0000000000000000.7.azuredatabricks.net
run_as:
# Put here the associated service_principal_name
service_principal_name: 76w4hdge-39a2-0303-45c7-udnr93kvp03f
resources:
jobs:
my_workflow:
job_clusters:
- job_cluster_key: ${bundle.target}-${bundle.name}-job-cluster
new_cluster:
num_workers: 2
spark_version: "15.3.x-cpu-ml-scala2.12"
node_type_id: Standard_DS3_v2
permissions:
# If you are using a group, you need to create it in the Databricks workspace
- group_name: "my_group_name"
level: "CAN_MANAGE"
2. Create a .databrickscfg file in the same route where your databricks-cli is installed, so it has the following information:
[my_workflow]
host=https://adb-0000000000000000.7.azuredatabricks.net/
client_id = 76w4hdge-39a2-0303-45c7-udnr93kvp03f
client_secret = tomatoes***************spinach
3. In the terminal just run:
databricks bundle deploy --profile my_workflow
If all was done correctly this should be the output:
(.venv) oishiiramen@3301 my_directory % databricks bundle deploy --profile my_workflow
Building default...
Uploading my_workflow-0.1.1-py3-none-any.whl...
Uploading bundle files to /Users/76w4hdge-39a2-0303-45c7-udnr93kvp03f/.bundle/my_workflow/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
If the .databrickscfg was not created this could appear:
(.venv) oishiiramen@3301 my_directory % databricks bundle deploy --profile my_workflow
Error: cannot resolve bundle auth configuration: cannot parse config file: open /Users/oishiiramen/.databrickscfg: no such file or directory
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group