cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Generate longer token for Databricks with Azure.

Etyr
Contributor

I'm using DefaultAzureCredential from azure-identity to connect to Azure with service principal environment variables (AZURE_CLIENT_SECRET, AZURE_TENANT_ID, AZURE_CLIENT_ID).

I can get_token from a specific scope for databricks like this:

from azure.identity import DefaultAzureCredential
 
dbx_scope = "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"
token = DefaultAzureCredential().get_token(dbx_scope).token

So this is working great, I get the token, and then I can use `databricks-connect` to configure my connection to the cluster. This generates me a configuration ($HOME/.databricks-connect) file for Spark to know where to connect and use the given token.

{
  "host": "https://adb-1234.azuredatabricks.net",
  "token": "eyJ0eXAiXXXXXXXXXXXXXXXXXXXXXx",
  "cluster_id": "1234",
  "org_id": "1234",
  "port": "15001"
}

The issue is that this token does not last very long. When I use spark for more than an hour, I get disconnected because the token is expired.

Is there a way to get a longer token for databricks with a Service Principal ? Since this aim to be for production, I wish my code could generate a PAT for any run, I don't want to create a PAT manually and store it to an Azure Key Vault.

1 ACCEPTED SOLUTION

Accepted Solutions

Etyr
Contributor

I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...

You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.

View solution in original post

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

There is REST API endpoint to manage tokens:

https://docs.databricks.com/dev-tools/api/latest/token-management.html

So using your code, you get the host and a short token. So all you need to do is construct Rest API, which will generate long-term connections.

Create a token on behalf of a service principal. >> https://docs.databricks.com/dev-tools/api/latest/token-management.html#operation/create-obo-token

This issue with this (I think) is that it will create a new token for each run of my code in Azure ML. So if I get over 600 runs, I generate 600 PAT and that's the Databricks limit of PATs. The next ones wont be able to create new tokens and runs would be stucks.

Is there a way to remove "old" PAT for exemple PAT that are older than 24 hours?

I was thinking of a solution that kept the host short token. Every X minutes I ask for a new token, but I have to re init my sparksession and loose all the work. Isn"t a way to inject the token in spark.config ?

Something like this:

spark_session.conf.set("spark.some.option.otken", new_token)

Hubert-Dudek
Esteemed Contributor III

there is API calls to delete or manage so you can implement own logic

Kaniz
Community Manager
Community Manager

Hi @Antoine Tavernier​(Customer)​ , We haven’t heard from you since the last response from @Hubert Dudek​, and I was checking back to see if his suggestions helped you.

Or else, If you have any solution, please do share that with the community as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Etyr
Contributor

I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...

You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.