Reading data from Serverless Warehouse from Azure Functions in Python - using managed identities

MRTN
Contributor

We are trying to run a simple service on an Azure Function app, where we need to query some data from a Databricks Warehouse. We want to avoid managing secrets, and hence try to use Microsoft Entra authentication all the way. Using various available online sources - we have tried the following steps. 

  • Enabled a system assigned "Managed Identity" on the Azure Function app
  • Created a service principal in the Databricks workspace - "Microsoft Entra ID managed". Linked the service principal to the Azure function by pasting in the application_id 
  • Given the service principal access to the Warehouse (Use)
  • Give service principal USE access to catalog and schema, SELECT access to relevant table
from databricks import sql
from azure.identity import DefaultAzureCredential, AzureCliCredential 
credential=DefaultAzureCredential()
token=credential.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default").token
with sql.connect(
    server_hostname=DATABRICKS_HOST, 
    http_path=DATABRICKS_WAREHOSE, 
    access_token=token) as connection:
    with connection.cursor() as cursor:
        cursor.execute(f"SELECT * FROM {CATALOG}.{SCHEMA}.{TABLE}")
        result = cursor.fetchall()
        df=pd.DataFrame(result, columns=[desc[0] for desc in cursor.description])

When testing the Azure Function running locally (effectively using AzureCLICredentials and my own access), this works fine. However, when running the code on the function app itself it returns a 403: Forbidden error. 

When creating a OAuth secret in the principal, the table can be read using the following code. I guess this confirms that the service principal has the right access in Databricks. However, we would like to avoid managing secrets. 

from databricks.sdk.core import Config, oauth_service_principal

def credential_provider():
  config = Config(
    host          = f"https://{server_hostname}",
    client_id     = client_id,
    client_secret = token)
  return oauth_service_principal(config)

with sql.connect(server_hostname      = server_hostname,
                 http_path            = DATABRICKS_WAREHOSE,
                 credentials_provider = credential_provider) as connection:
    with connection.cursor() as cursor:
        cursor.execute(f"SELECT * FROM {CATALOG}.{SCHEMA}.{TABLE}")
        result = cursor.fetchall()

        df=pd.DataFrame(result, columns=[desc[0] for desc in cursor.description])
df.head()

Any suggestions on further actions? What are we missing if anything?