Databricks Community

mstfkmlbsbdk · ‎02-27-2025

I have my own Autoloader repo and this repo is responsible for ingestion data from landing layer(ADLS) and load data into raw layer in Databricks. In that repo, I created a couple of workflows, and run these workflows with serverless cluster. and I use whl python package as dependent libraries in my tasks.

I have NCC connection but still getting error. Becuase I have a couple of spark configuration in this repo.

I set the following configurations in py file:

def set_storage_account_config(
    storage_account: str,
    secret_scope: str,
    spn_tenant_id_key: str,
    spn_client_id_key: str,
    spn_client_secret_key: str,
) -> None:
    """
    This function will fetch the SPN information from key vault using the provided key
    names and scope and use it to configure Spark to the use this SPN when connection
    to the given ADLS Gen2 storage account.
    """
    logger.info(f"Setting spark config for storage account '{storage_account}'")

    spn_tenant_id = dbutils.secrets.get(scope=secret_scope, key=spn_tenant_id_key)
    spn_client_id = dbutils.secrets.get(scope=secret_scope, key=spn_client_id_key)
    spn_client_secret = dbutils.secrets.get(scope=secret_scope, key=spn_client_secret_key)

    spark.conf.set(
        f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth"
    )
    spark.conf.set(
        f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net",
        "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    )
    spark.conf.set(
        f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net",
        spn_client_id,
    )
    spark.conf.set(
        f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net",
        spn_client_secret,
    )
    spark.conf.set(
        f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net",
        f"https://login.microsoftonline.com/{spn_tenant_id}/oauth2/token",
    )

and this adls config in other py file:

def set_delta_table_properties(delta_table_properties: dict) -> None:
    """
    This function will take a dictionary of delta table properties and set each of them
    as spark session defaults. For a complete list check this:
    https://docs.databricks.com/en/delta/table-properties.html. This function should be
    called only once before loading any of the sources.
    """
    logger.info("Setting spark session delta table properties")
    logger.debug(f"Using this config '{delta_table_properties}'")

    # Set the properties for the spark session
    for k, v in delta_table_properties.items():
        logger.debug(f"Setting 'spark.databricks.delta.properties.defaults.{k}' to '{v}'")
        spark.sql(f"set spark.databricks.delta.properties.defaults.{k} = {v}")

However, when I perform the workflow on a serverless environment, I get the following error:

Error:

[CONFIG_NOT_AVAILABLE] Configuration fs.azure.account.auth.type.adlsxxxxxx.dfs.core.windows.net is not available. SQLSTATE: 42K0I

How can I access files stored in ADLS with serverless?

Thank you.

cgrant · a month ago

The recommended approach for accessing cloud storage is to create Databricks storage credentials. These storage credentials can refer to entra service principals, managed identities, etc. After a credential is created, create an external location. When this is done, you will be able to access the ADLS location without any additional configuration.

View solution in original post

cgrant · a month ago

The recommended approach for accessing cloud storage is to create Databricks storage credentials. These storage credentials can refer to entra service principals, managed identities, etc. After a credential is created, create an external location. When this is done, you will be able to access the ADLS location without any additional configuration.