- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2023 06:15 AM
I'm using DefaultAzureCredential from azure-identity to connect to Azure with service principal environment variables (AZURE_CLIENT_SECRET, AZURE_TENANT_ID, AZURE_CLIENT_ID).
I can get_token from a specific scope for databricks like this:
from azure.identity import DefaultAzureCredential
dbx_scope = "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"
token = DefaultAzureCredential().get_token(dbx_scope).token
So this is working great, I get the token, and then I can use `databricks-connect` to configure my connection to the cluster. This generates me a configuration ($HOME/.databricks-connect) file for Spark to know where to connect and use the given token.
{
"host": "https://adb-1234.azuredatabricks.net",
"token": "eyJ0eXAiXXXXXXXXXXXXXXXXXXXXXx",
"cluster_id": "1234",
"org_id": "1234",
"port": "15001"
}
The issue is that this token does not last very long. When I use spark for more than an hour, I get disconnected because the token is expired.
Is there a way to get a longer token for databricks with a Service Principal ? Since this aim to be for production, I wish my code could generate a PAT for any run, I don't want to create a PAT manually and store it to an Azure Key Vault.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2023 08:15 AM
I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...
You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2023 07:44 AM
There is REST API endpoint to manage tokens:
https://docs.databricks.com/dev-tools/api/latest/token-management.html
So using your code, you get the host and a short token. So all you need to do is construct Rest API, which will generate long-term connections.
Create a token on behalf of a service principal. >> https://docs.databricks.com/dev-tools/api/latest/token-management.html#operation/create-obo-token
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2023 07:53 AM
This issue with this (I think) is that it will create a new token for each run of my code in Azure ML. So if I get over 600 runs, I generate 600 PAT and that's the Databricks limit of PATs. The next ones wont be able to create new tokens and runs would be stucks.
Is there a way to remove "old" PAT for exemple PAT that are older than 24 hours?
I was thinking of a solution that kept the host short token. Every X minutes I ask for a new token, but I have to re init my sparksession and loose all the work. Isn"t a way to inject the token in spark.config ?
Something like this:
spark_session.conf.set("spark.some.option.otken", new_token)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2023 07:55 AM
there is API calls to delete or manage so you can implement own logic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2023 08:15 AM
I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...
You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.

