cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Access Azure storage with serverless compute

mai_luca
New Contributor III

I would like to know how to connect to Azure Blob Storage in a Python job inside a workflow with serverless cluster. When working with a non-serverless cluster or with serverless in a declarative pipeline, I would typically set the Azure storage account key using the spark.conf.set method as shown below:

spark.conf.set(
"fs.azure.account.key.<storage-account>.dfs.core.windows.net",
dbutils.secrets.get(scope="<scope>", key="<storage-account-access-key>"))

 

1 ACCEPTED SOLUTION

Accepted Solutions

nayan_wylde
Honored Contributor

Use the below code in your notebook. You cannot set spark config in serverless as there is no advanced options in cluster.

credential_id = dbutils.secrets.get(scope="{scope_name}",key="{app_id}")
credential_key = dbutils.secrets.get(scope="{scope_name}",key="{app_key}")

spark.conf.set("fs.azure.account.auth.type.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.dfs.core.windows.net", credential_id)
spark.conf.set("fs.azure.account.oauth2.client.secret.dfs.core.windows.net", credential_key)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.dfs.core.windows.net", "https://login.microsoftonline.com/{azure tenant id}/oauth2/token")

 If you are using serverless please use external locations and Unity catalog for data lake access.

https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-external-locations

 

View solution in original post

2 REPLIES 2

CURIOUS_DE
Contributor III

Option 1: Use Azure Service Principal + ABFS OAuth Authentication (Recommended for Prod)

1. Register a Service Principal in Azure

  • Grant it access to the Blob Storage (container or storage account) with Storage Blob Data Reader/Writer roles.

2. Mount using OAuth credentials

configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": dbutils.secrets.get(scope="<scope>", key="<client-id-key>"),
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope>", key="<client-secret-key>"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token",
}

dbutils.fs.mount(
source = "abfss://<container>@<storage-account>.dfs.core.windows.net/",
mount_point = "/mnt/<mount-name>",
extra_configs = configs)

Note:- This works in serverless clusters and avoids using account keys (which are less secure). 

Supported in Serverless

OAuth (Service Principal) -  Yes(Recommended)
dbutils.secrets - Yes

 

Not Supported in Serverless

spark.conf.set for sensitive keys - NO
Environment variables - NO
    

 

Best Practice

  • Use Azure AD OAuth (Service Principal) wherever possible.

  • Store secrets in Databricks Secrets and access them securely.

Databricks Solution Architect

nayan_wylde
Honored Contributor

Use the below code in your notebook. You cannot set spark config in serverless as there is no advanced options in cluster.

credential_id = dbutils.secrets.get(scope="{scope_name}",key="{app_id}")
credential_key = dbutils.secrets.get(scope="{scope_name}",key="{app_key}")

spark.conf.set("fs.azure.account.auth.type.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.dfs.core.windows.net", credential_id)
spark.conf.set("fs.azure.account.oauth2.client.secret.dfs.core.windows.net", credential_key)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.dfs.core.windows.net", "https://login.microsoftonline.com/{azure tenant id}/oauth2/token")

 If you are using serverless please use external locations and Unity catalog for data lake access.

https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-external-locations