cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

DBT Job Type Authenticating to Azure Devops for git_source

rsamant07
New Contributor III

we are trying to execute the databricks jobs for dbt task type but it is failing to autheticate to git. Problem is job is created using service principal but service principal don't seem to have access to the repo.

few questions we have:

1) can we give service prinicipal access to the azure repo ?

2) can we edit the properties "run_as_user_name" after the job is created or can we submit run without creating job but giving "run_as_user_name" explicitly ?

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Rahul Samant​ :

It is not possible to edit the "run_as_user_name" property after the job is created. However, you can submit the run without creating a job by using the Databricks API directly and passing the "run_as_user_name" parameter explicitly. If you are using the Databricks 2.0 API, the "run_as_user_name" parameter may not be supported, and you may need to upgrade to the Databricks 2.1 AP

You can authenticate using a Personal Access Token (PAT) or an SSH key. If you are using a PAT, you don't need to provide the SSH key and known_hosts files. The Git configuration should look like this:

git:
  auth:
    type: token
    token: my_git_credentials_token

You can provide the PAT value directly in the "token" field.

View solution in original post

11 REPLIES 11

Anonymous
Not applicable

@Rahul Samant​ :

Yes, you can grant access to the service principal to the Azure DevOps repository. You will need to add the service principal to the repository's security group or team with the necessary permissions.

You can edit the properties of a job after it is created, including the "run_as_user_name" property. To do this, you can follow these steps:

  • Go to the Databricks workspace and navigate to the "Jobs" tab
  • Find the job you want to edit and click on it to open the job details
  • Click on the "Edit" button in the top right corner
  • Make the necessary changes to the job properties, including the "run_as_user_name" property
  • Click "Save" to save the changes

Alternatively, you can submit a run without creating a job and specify the "run_as_user_name" property explicitly.

rsamant07
New Contributor III

Thanks @Suteja Kanuri​  for your response. its helpful.

Can you share how to pass run_as_user_name in parameters to run job without creating a job. i was using    . DatabricksSubmitRunOperator in airflow and passing below json as param but its not considering it. also through cli its not considering it may be my format of json is not correct for the param run_as_user_name ?

 "json": {

          "existing_cluster_id": "{{ env.existing_cluster_id }}",

          "dbt_task": {

            "project_directory": "",

            "commands": [

              "dbt deps",

              "dbt run --select models/ZZZZ"

            ],

            "schema": "kpi",

            "warehouse_id": "XXXXXX"

          },

          "git_source": {

            "git_url": "https://axxxx-dbt",

            "git_provider": "azureDevOpsServices",

            "git_branch": "master"

          },

          "run_as_user_name": "user@test.com"

        }

Anonymous
Not applicable

@Rahul Samant​ :

you can submit the run without creating a job and specify the "run_as_user_name" explicitly. You can do this using the Databricks CLI or API by specifying the user name in the command or API call. For example, using the Databricks CLI, you can submit a job run with the following command:

databricks jobs run-now --job-id <job-id> --notebook-params <parameters> --run-as <user-name>

Replace <job-id>, <parameters>, and <user-name> with the appropriate values for your job.

Hope this helps!

Anonymous
Not applicable

@Rahul Samant​ :

Based on the JSON payload you provided, it looks like you are using the DatabricksSubmitRunOperator in Airflow to submit a Databricks job run. To pass the run_as_user_name parameter, you can add it as a top-level parameter in the json dictionary like this:

"json": {
    "existing_cluster_id": "{{ env.existing_cluster_id }}",
    "dbt_task": {
        "project_directory": "",
        "commands": [
            "dbt deps",
            "dbt run --select models/ZZZZ"
        ],
        "schema": "kpi",
        "warehouse_id": "XXXXXX"
    },
    "git_source": {
        "git_url": "https://axxxx-dbt",
        "git_provider": "azureDevOpsServices",
        "git_branch": "master"
    },
    "run_as_user_name": "user@test.com"
}

Make sure that the value of run_as_user_name is a valid Databricks user name or email address with the appropriate permissions to access the necessary resources.

rsamant07
New Contributor III

Hi @Suteja Kanuri​  , i tried this approach but seems to be not working ? it takes the default user who submitted the run . i also tried to use the access control list along with run_as_user_name but both ways didn't work

"access_control_list": [

                        {

                            "user_name": "test@com",

                            "permission_level": "IS_OWNER"

                        }

                    ],

"run_as_user_name": "test@com"

Anonymous
Not applicable

@Rahul Samant​ :

I see, it's possible that the "run_as_user_name" property may not be working as expected. In this case, you can try setting up a Databricks secret for the Git credentials and use it in your dbt_project.yml file.

Here are the steps to set up a Databricks secret for Git credentials:

  1. Navigate to the Databricks workspace and click on "Secrets" in the left-hand sidebar.
  2. Click on the "New Secret Scope" button and create a new scope for your Git credentials.
  3. Click on the newly created scope and then click on "Add" to add a secret. Enter the Git credentials as key-value pairs.
  4. In your dbt_project.yml file, replace the Git credentials with the secret reference. Here's an example:
repositories:
  - name: my_repo
    package: dbt-mssql
    revision: master
    url: git@github.com:my_org/my_repo.git
    depends_on:
      - package: jaffle_shop
    git:
      auth:
        type: ssh
        ssh_key: /path/to/ssh/key
        known_hosts: /path/to/known_hosts/file
      secret_name: my_secret_scope_name
      secret_key: my_git_credentials_key

5 Save the changes to your dbt_project.yml file and run your job. The Git credentials will be retrieved from the Databricks secret and used for authentication.

6 Let me know if this helps or if you have any further questions!

rsamant07
New Contributor III

Hi @Suteja Kanuri​ . it seems access_control_list is working fine with 2.1 api and the run_as_username is updated with it but with our airflow platform using 2.0 api we are facing issue and this parameter is ignored. we want to test the git credential solution that you mentioned above. but i see in the configuration you are asking ssh_keys as well as secret ? can it be authenticated by just PAT or we also need ssh keys ?? can we remove the bold part and only provide credential_keys ?

  1. git:
  2. auth:
  3. type: ssh
  4. ssh_key: /path/to/ssh/key
  5. known_hosts: /path/to/known_hosts/file
  6. secret_name: my_secret_scope_name
  7. secret_key: my_git_credentials_key

Anonymous
Not applicable

@Rahul Samant​ :

It is not possible to edit the "run_as_user_name" property after the job is created. However, you can submit the run without creating a job by using the Databricks API directly and passing the "run_as_user_name" parameter explicitly. If you are using the Databricks 2.0 API, the "run_as_user_name" parameter may not be supported, and you may need to upgrade to the Databricks 2.1 AP

You can authenticate using a Personal Access Token (PAT) or an SSH key. If you are using a PAT, you don't need to provide the SSH key and known_hosts files. The Git configuration should look like this:

git:
  auth:
    type: token
    token: my_git_credentials_token

You can provide the PAT value directly in the "token" field.

rsamant07
New Contributor III

Thanks @Suteja Kanuri​  for your advice on the issue, i am able to get it working by calling databricks api 2.1 explicitly from airflow . just for curiosity. PAT Token can be provided from the secret scope as well ? can you shared any documentation on this how to authenticate to git from dbt_project.yml ?

Anonymous
Not applicable

@Rahul Samant​ : Answering both your questions.

The dbt documentation has a section on managing project dependencies that includes information on how to authenticate to private Git repositories - https://docs.getdbt.com/docs/build/packages#managing-project-dependencies

Yes, you can provide a Personal Access Token (PAT) from a Databricks secret scope for authentication purposes.

Anonymous
Not applicable

Hi @Rahul Samant​ 

I'm sorry you could not find a solution to your problem in the answers provided.

Our community strives to provide helpful and accurate information, but sometimes an immediate solution may only be available for some issues.

I suggest providing more information about your problem, such as specific error messages, error logs or details about the steps you have taken. This can help our community members better understand the issue and provide more targeted solutions.

Alternatively, you can consider contacting the support team for your product or service. They may be able to provide additional assistance or escalate the issue to the appropriate section for further investigation.

Thank you for your patience and understanding, and please let us know if there is anything else we can do to assist you.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.