cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Configure SAS Token for ADLS Access in Databricks Job (Works on Classic Cluster, Fails on Serverless

smpa01
Contributor

I am running a Databricks job that reads from a Delta table and writes to an ADLS Gen2 location using a SAS token for authentication.

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
sas_token = dbutils.secrets.get(scope=scope, key=adls_secret_key_name)
spark.conf.set("fs.azure.sas.<container>.<storage-account>.blob.core.windows.net", sas_token)

df = spark.read.table("my_catalog.my_schema.my_table")
df.write.format("delta").mode("overwrite").save("abfss://<container>@<storage-account>.dfs.core.windows.net/my/path")

This works when I run the job on a classic cluster, but fails with the following error when I use a Serverless (SQL Warehouse/Serverless Compute) cluster:

[CONFIG_NOT_AVAILABLE] Configuration fs.azure.sas.<container>.<storage-account>.blob.core.windows.net is not available. SQLSTATE: 42K0I

Questions:

  • Why does this work on classic clusters but not on Serverless?
  • What is the recommended way to provide the SAS token for ADLS access in Serverless jobs?

Thanks for your help!

 

smpa01_0-1769708393025.png

 

3 REPLIES 3

szymon_dybczak
Esteemed Contributor III

Hi @smpa01 ,

Unfortunately the serverless compute supports only limited number of spark properties. Below you can find all that can be configured. The one you trying to set is not supported hence you get there error 

https://docs.databricks.com/aws/en/spark/conf#configure-spark-properties-for-serverless-notebooks-an...

MoJaMa
Databricks Employee
Databricks Employee

Agree with @szymon_dybczak 

@smpa01  Are you doing this from a classic cluster in dedicated access mode? Have you tried from a classic cluster in standard access mode? The latter is closest to Serverless in terms of the secure sandboxing. Serverless is the most strict due to the combination of the spark connect architecture and the intentional design choice to be as knobless as possible.

In UC Standard and Serverless (which uses UC Standard under the hood), the recommendation is to use Access Connector with Managed Identity (in UC this is your Storage Credential) to secure your container (in UC this your External Location).

Docs: https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/azure-managed...

Reason why it doesn't work:

In Unity Catalog standard access mode, Hadoop/Spark filesystem configs like fs.azure.* are intentionally ignored for data access, so injecting SAS tokens via spark.conf.set(...) won’t be used. You must access storage through Unity Catalog’s governed paths (external locations or volumes) instead.

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @smpa01,

The reason this works on a classic cluster but fails on serverless is that serverless compute only supports a very limited set of Spark configuration properties. The fs.azure.sas.* Hadoop configurations you are setting via spark.conf.set are not in the allowed list, so serverless rejects them with the CONFIG_NOT_AVAILABLE error.

On classic clusters, you have full control over the Spark and Hadoop configuration namespace, which is why setting the SAS token via spark.conf.set works there. On serverless compute, Databricks manages the infrastructure and restricts configuration to a small set of supported properties (things like spark.sql.shuffle.partitions, spark.sql.session.timeZone, spark.sql.ansi.enabled, and a few others). Arbitrary Hadoop/Azure storage configurations are not permitted.

The full list of supported serverless Spark properties is documented here:
https://learn.microsoft.com/en-us/azure/databricks/spark/conf#serverless


RECOMMENDED APPROACH: UNITY CATALOG EXTERNAL LOCATIONS

The supported way to access ADLS Gen2 from serverless compute is through Unity Catalog storage credentials and external locations. This approach centralizes access governance and works seamlessly across all compute types, including serverless.

Here is the high-level workflow:

1. Create an Azure Databricks Access Connector in the Azure Portal. This is a first-party Azure resource that provides a managed identity for authenticating to your storage account.

2. Grant the Access Connector's managed identity the "Storage Blob Data Contributor" role (or a more limited role if you only need read access) on your ADLS Gen2 storage account or container.

3. Create a Unity Catalog storage credential that references the Access Connector:

CREATE STORAGE CREDENTIAL my_adls_credential
WITH (
AZURE_MANAGED_IDENTITY = (
ACCESS_CONNECTOR_ID = '/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Databricks/accessConnectors/<connector-name>'
)
);

4. Create an external location that maps your ADLS path to the storage credential:

CREATE EXTERNAL LOCATION my_adls_location
URL 'abfss://<container>@<storage-account>.dfs.core.windows.net/my/path'
WITH (STORAGE CREDENTIAL my_adls_credential);

5. Once the external location is in place, your write code works on serverless without any spark.conf.set calls:

df = spark.read.table("my_catalog.my_schema.my_table")
df.write.format("delta").mode("overwrite").save(
"abfss://<container>@<storage-account>.dfs.core.windows.net/my/path"
)

Unity Catalog handles the authentication transparently. No secrets or SAS tokens need to be embedded in your notebook or job code.


WHY MANAGED IDENTITY OVER SAS TOKENS

Using an Azure managed identity through Unity Catalog has several advantages over SAS tokens:

- No secret rotation: managed identities do not require you to maintain credentials or rotate secrets.
- Network rule support: managed identities can access storage accounts protected by Azure storage firewalls, which is not possible with SAS tokens or service principals.
- Centralized governance: Unity Catalog provides fine-grained access control (grants/revokes) on the external location, so you manage who can read or write to that path in one place.
- Works everywhere: the same external location works on classic clusters, serverless notebooks, serverless jobs, and SQL warehouses.


DOCUMENTATION REFERENCES

- Serverless compute limitations:
https://learn.microsoft.com/en-us/azure/databricks/compute/serverless/limitations

- Supported Spark properties on serverless:
https://learn.microsoft.com/en-us/azure/databricks/spark/conf#serverless

- Connect to cloud storage using Unity Catalog:
https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/

- Create a storage credential for ADLS:
https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/storage-crede...

- Configure managed identity for Unity Catalog:
https://learn.microsoft.com/en-us/azure/databricks/connect/unity-catalog/cloud-storage/azure-managed...


If you have any questions about setting up the Access Connector or external location, feel free to follow up.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.