01-11-2023 06:15 AM
I'm using DefaultAzureCredential from azure-identity to connect to Azure with service principal environment variables (AZURE_CLIENT_SECRET, AZURE_TENANT_ID, AZURE_CLIENT_ID).
I can get_token from a specific scope for databricks like this:
from azure.identity import DefaultAzureCredential
dbx_scope = "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"
token = DefaultAzureCredential().get_token(dbx_scope).token
So this is working great, I get the token, and then I can use `databricks-connect` to configure my connection to the cluster. This generates me a configuration ($HOME/.databricks-connect) file for Spark to know where to connect and use the given token.
{
"host": "https://adb-1234.azuredatabricks.net",
"token": "eyJ0eXAiXXXXXXXXXXXXXXXXXXXXXx",
"cluster_id": "1234",
"org_id": "1234",
"port": "15001"
}
The issue is that this token does not last very long. When I use spark for more than an hour, I get disconnected because the token is expired.
Is there a way to get a longer token for databricks with a Service Principal ? Since this aim to be for production, I wish my code could generate a PAT for any run, I don't want to create a PAT manually and store it to an Azure Key Vault.
01-16-2023 08:15 AM
I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...
You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.
01-11-2023 07:44 AM
There is REST API endpoint to manage tokens:
https://docs.databricks.com/dev-tools/api/latest/token-management.html
So using your code, you get the host and a short token. So all you need to do is construct Rest API, which will generate long-term connections.
Create a token on behalf of a service principal. >> https://docs.databricks.com/dev-tools/api/latest/token-management.html#operation/create-obo-token
01-11-2023 07:53 AM
This issue with this (I think) is that it will create a new token for each run of my code in Azure ML. So if I get over 600 runs, I generate 600 PAT and that's the Databricks limit of PATs. The next ones wont be able to create new tokens and runs would be stucks.
Is there a way to remove "old" PAT for exemple PAT that are older than 24 hours?
I was thinking of a solution that kept the host short token. Every X minutes I ask for a new token, but I have to re init my sparksession and loose all the work. Isn"t a way to inject the token in spark.config ?
Something like this:
spark_session.conf.set("spark.some.option.otken", new_token)
01-11-2023 07:55 AM
there is API calls to delete or manage so you can implement own logic
01-16-2023 07:10 AM
Hi @Antoine Tavernier(Customer) , We haven’t heard from you since the last response from @Hubert Dudek, and I was checking back to see if his suggestions helped you.
Or else, If you have any solution, please do share that with the community as it can be helpful to others.
Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
01-16-2023 08:15 AM
I made up an alternative solution. I made up my own python class to handle my PAT from Databricks : https://stackoverflow.com/questions/75071869/python-defaultazurecredential-get-token-set-expiration-...
You can be fancier or even register an atexit inside the class to destroy the PAT. But this will have a side effect. The python process will exit with no error code, but if you have a logger, it will warn you that connection with databricks are closed because of invalid token. Which is "normal", but ugly.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.