03-22-2023 09:12 AM
we are trying to execute the databricks jobs for dbt task type but it is failing to autheticate to git. Problem is job is created using service principal but service principal don't seem to have access to the repo.
few questions we have:
1) can we give service prinicipal access to the azure repo ?
2) can we edit the properties "run_as_user_name" after the job is created or can we submit run without creating job but giving "run_as_user_name" explicitly ?
04-14-2023 09:57 AM
@Rahul Samant :
It is not possible to edit the "run_as_user_name" property after the job is created. However, you can submit the run without creating a job by using the Databricks API directly and passing the "run_as_user_name" parameter explicitly. If you are using the Databricks 2.0 API, the "run_as_user_name" parameter may not be supported, and you may need to upgrade to the Databricks 2.1 AP
You can authenticate using a Personal Access Token (PAT) or an SSH key. If you are using a PAT, you don't need to provide the SSH key and known_hosts files. The Git configuration should look like this:
git:
auth:
type: token
token: my_git_credentials_token
You can provide the PAT value directly in the "token" field.
03-22-2023 09:45 PM
@Rahul Samant :
Yes, you can grant access to the service principal to the Azure DevOps repository. You will need to add the service principal to the repository's security group or team with the necessary permissions.
You can edit the properties of a job after it is created, including the "run_as_user_name" property. To do this, you can follow these steps:
Alternatively, you can submit a run without creating a job and specify the "run_as_user_name" property explicitly.
03-23-2023 01:32 AM
Thanks @Suteja Kanuri for your response. its helpful.
Can you share how to pass run_as_user_name in parameters to run job without creating a job. i was using . DatabricksSubmitRunOperator in airflow and passing below json as param but its not considering it. also through cli its not considering it may be my format of json is not correct for the param run_as_user_name ?
"json": {
"existing_cluster_id": "{{ env.existing_cluster_id }}",
"dbt_task": {
"project_directory": "",
"commands": [
"dbt deps",
"dbt run --select models/ZZZZ"
],
"schema": "kpi",
"warehouse_id": "XXXXXX"
},
"git_source": {
"git_url": "https://axxxx-dbt",
"git_provider": "azureDevOpsServices",
"git_branch": "master"
},
"run_as_user_name": "user@test.com"
}
03-23-2023 04:28 AM
@Rahul Samant :
you can submit the run without creating a job and specify the "run_as_user_name" explicitly. You can do this using the Databricks CLI or API by specifying the user name in the command or API call. For example, using the Databricks CLI, you can submit a job run with the following command:
databricks jobs run-now --job-id <job-id> --notebook-params <parameters> --run-as <user-name>
Replace <job-id>, <parameters>, and <user-name> with the appropriate values for your job.
Hope this helps!
03-23-2023 04:32 AM
@Rahul Samant :
Based on the JSON payload you provided, it looks like you are using the DatabricksSubmitRunOperator in Airflow to submit a Databricks job run. To pass the run_as_user_name parameter, you can add it as a top-level parameter in the json dictionary like this:
"json": {
"existing_cluster_id": "{{ env.existing_cluster_id }}",
"dbt_task": {
"project_directory": "",
"commands": [
"dbt deps",
"dbt run --select models/ZZZZ"
],
"schema": "kpi",
"warehouse_id": "XXXXXX"
},
"git_source": {
"git_url": "https://axxxx-dbt",
"git_provider": "azureDevOpsServices",
"git_branch": "master"
},
"run_as_user_name": "user@test.com"
}
Make sure that the value of run_as_user_name is a valid Databricks user name or email address with the appropriate permissions to access the necessary resources.
03-23-2023 08:13 AM
Hi @Suteja Kanuri , i tried this approach but seems to be not working ? it takes the default user who submitted the run . i also tried to use the access control list along with run_as_user_name but both ways didn't work
"access_control_list": [
{
"user_name": "test@com",
"permission_level": "IS_OWNER"
}
],
"run_as_user_name": "test@com"
04-01-2023 09:10 PM
@Rahul Samant :
I see, it's possible that the "run_as_user_name" property may not be working as expected. In this case, you can try setting up a Databricks secret for the Git credentials and use it in your dbt_project.yml file.
Here are the steps to set up a Databricks secret for Git credentials:
repositories:
- name: my_repo
package: dbt-mssql
revision: master
url: git@github.com:my_org/my_repo.git
depends_on:
- package: jaffle_shop
git:
auth:
type: ssh
ssh_key: /path/to/ssh/key
known_hosts: /path/to/known_hosts/file
secret_name: my_secret_scope_name
secret_key: my_git_credentials_key
5 Save the changes to your dbt_project.yml file and run your job. The Git credentials will be retrieved from the Databricks secret and used for authentication.
6 Let me know if this helps or if you have any further questions!
04-11-2023 08:40 AM
Hi @Suteja Kanuri . it seems access_control_list is working fine with 2.1 api and the run_as_username is updated with it but with our airflow platform using 2.0 api we are facing issue and this parameter is ignored. we want to test the git credential solution that you mentioned above. but i see in the configuration you are asking ssh_keys as well as secret ? can it be authenticated by just PAT or we also need ssh keys ?? can we remove the bold part and only provide credential_keys ?
04-14-2023 09:57 AM
@Rahul Samant :
It is not possible to edit the "run_as_user_name" property after the job is created. However, you can submit the run without creating a job by using the Databricks API directly and passing the "run_as_user_name" parameter explicitly. If you are using the Databricks 2.0 API, the "run_as_user_name" parameter may not be supported, and you may need to upgrade to the Databricks 2.1 AP
You can authenticate using a Personal Access Token (PAT) or an SSH key. If you are using a PAT, you don't need to provide the SSH key and known_hosts files. The Git configuration should look like this:
git:
auth:
type: token
token: my_git_credentials_token
You can provide the PAT value directly in the "token" field.
04-14-2023 10:17 AM
Thanks @Suteja Kanuri for your advice on the issue, i am able to get it working by calling databricks api 2.1 explicitly from airflow . just for curiosity. PAT Token can be provided from the secret scope as well ? can you shared any documentation on this how to authenticate to git from dbt_project.yml ?
04-17-2023 06:06 AM
@Rahul Samant : Answering both your questions.
The dbt documentation has a section on managing project dependencies that includes information on how to authenticate to private Git repositories - https://docs.getdbt.com/docs/build/packages#managing-project-dependencies
Yes, you can provide a Personal Access Token (PAT) from a Databricks secret scope for authentication purposes.
03-29-2023 10:42 PM
Hi @Rahul Samant
I'm sorry you could not find a solution to your problem in the answers provided.
Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.
I suggest providing more information about your problem, such as specific error messages, error logs or details about the steps you have taken. This can help our community members better understand the issue and provide more targeted solutions.
Alternatively, you can consider contacting the support team for your product or service. They may be able to provide additional assistance or escalate the issue to the appropriate section for further investigation.
Thank you for your patience and understanding, and please let us know if there is anything else we can do to assist you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group