cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Generate longer token for Databricks with Azure.

Etyr
Contributor

I'm using DefaultAzureCredential from azure-identity to connect to Azure with service principal environment variables (AZURE_CLIENT_SECRET, AZURE_TENANT_ID, AZURE_CLIENT_ID).

I can get_token from a specific scope for databricks like this:

from azure.identity import DefaultAzureCredential
 
dbx_scope = "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"
token = DefaultAzureCredential().get_token(dbx_scope).token

So this is working great, I get the token, and then I can use `databricks-connect` to configure my connection to the cluster. This generates me a configuration ($HOME/.databricks-connect) file for Spark to know where to connect and use the given token.

{
  "host": "https://adb-1234.azuredatabricks.net",
  "token": "eyJ0eXAiXXXXXXXXXXXXXXXXXXXXXx",
  "cluster_id": "1234",
  "org_id": "1234",
  "port": "15001"
}

The issue is that this token does not last very long. When I use spark for more than an hour, I get disconnected because the token is expired.

Is there a way to get a longer token for databricks with a Service Principal ? Since this aim to be for production, I wish my code could generate a PAT for any run, I don't want to create a PAT manually and store it to an Azure Key Vault.

1 ACCEPTED SOLUTION

Accepted Solutions

Etyr
Contributor

I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...

You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.

View solution in original post

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

There is REST API endpoint to manage tokens:

https://docs.databricks.com/dev-tools/api/latest/token-management.html

So using your code, you get the host and a short token. So all you need to do is construct Rest API, which will generate long-term connections.

Create a token on behalf of a service principal. >> https://docs.databricks.com/dev-tools/api/latest/token-management.html#operation/create-obo-token

This issue with this (I think) is that it will create a new token for each run of my code in Azure ML. So if I get over 600 runs, I generate 600 PAT and that's the Databricks limit of PATs. The next ones wont be able to create new tokens and runs would be stucks.

Is there a way to remove "old" PAT for exemple PAT that are older than 24 hours?

I was thinking of a solution that kept the host short token. Every X minutes I ask for a new token, but I have to re init my sparksession and loose all the work. Isn"t a way to inject the token in spark.config ?

Something like this:

spark_session.conf.set("spark.some.option.otken", new_token)

Hubert-Dudek
Esteemed Contributor III

there is API calls to delete or manage so you can implement own logic

Etyr
Contributor

I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...

You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group