Databricks Community

Etyr · ‎01-11-2023

I'm using DefaultAzureCredential from azure-identity to connect to Azure with service principal environment variables (AZURE_CLIENT_SECRET, AZURE_TENANT_ID, AZURE_CLIENT_ID).

I can get_token from a specific scope for databricks like this:

from azure.identity import DefaultAzureCredential
 
dbx_scope = "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"
token = DefaultAzureCredential().get_token(dbx_scope).token

So this is working great, I get the token, and then I can use `databricks-connect` to configure my connection to the cluster. This generates me a configuration ($HOME/.databricks-connect) file for Spark to know where to connect and use the given token.

{
  "host": "https://adb-1234.azuredatabricks.net",
  "token": "eyJ0eXAiXXXXXXXXXXXXXXXXXXXXXx",
  "cluster_id": "1234",
  "org_id": "1234",
  "port": "15001"
}

The issue is that this token does not last very long. When I use spark for more than an hour, I get disconnected because the token is expired.

Is there a way to get a longer token for databricks with a Service Principal ? Since this aim to be for production, I wish my code could generate a PAT for any run, I don't want to create a PAT manually and store it to an Azure Key Vault.

Etyr · ‎01-16-2023

I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...

You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.

View solution in original post

Hubert-Dudek · ‎01-11-2023

There is REST API endpoint to manage tokens:

https://docs.databricks.com/dev-tools/api/latest/token-management.html

So using your code, you get the host and a short token. So all you need to do is construct Rest API, which will generate long-term connections.

Create a token on behalf of a service principal. >> https://docs.databricks.com/dev-tools/api/latest/token-management.html#operation/create-obo-token

Etyr · ‎01-11-2023

This issue with this (I think) is that it will create a new token for each run of my code in Azure ML. So if I get over 600 runs, I generate 600 PAT and that's the Databricks limit of PATs. The next ones wont be able to create new tokens and runs would be stucks.

Is there a way to remove "old" PAT for exemple PAT that are older than 24 hours?

I was thinking of a solution that kept the host short token. Every X minutes I ask for a new token, but I have to re init my sparksession and loose all the work. Isn"t a way to inject the token in spark.config ?

Something like this:

spark_session.conf.set("spark.some.option.otken", new_token)

Hubert-Dudek · ‎01-11-2023

there is API calls to delete or manage so you can implement own logic

Etyr · ‎01-16-2023

I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...

You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.

Databricks Community

Generate longer token for Databricks with Azure.

li.media.uploader-dialog.title

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!