cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT Pipeline, Autoloader, Streaming Query Exception: Could not find ADLS Gen2 Token

databricks8923
New Contributor

I have set up autoloader to form a streaming table in my DLT pipeline, 

 

import dlt
@dlt.table
def streamFiles_new():
        return (
            spark.readStream.format("cloudFiles")
                .option("cloudFiles.format", "json")
                .option("cloudFiles.inferColumnTypes", "true")
                .option("multiLine", "true")
                .load("file_location")
                        )
 
When I run this cell in the notebook it goes through and infers the schema of the Delta Live Table. However, when I run the DLT pipeline that includes the notebook with this autoloader cell I get an error: "org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED]...terminated with exception: Could not find ADLS Gen2 Token"
I know we have ADLS Gen2 storage and in my DLT pipeline settings I have Compute set to "DLT Compute AAD Passthrough". I know the AAD pass through through is working and that the storage location is mounted because the DLT pipeline worked and was reading files from this location for materialized views. It just broke when I tried to instead implement autoloader for a streaming view. I do not have permissions under my Azure account to generate access keys for storage accounts.  
1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Your error suggests that while your DLT pipeline works for materialized views (batch reads), switching to a streaming table using Autoloader (readStream) is triggering an ADLS Gen2 authentication failure, specifically "Could not find ADLS Gen2 Token" in a streaming context.

Why This Happens

Autoloader (readStream) in DLT pipelines primarily relies on Spark Structured Streaming. Accessing ADLS Gen2 storage with AAD Passthrough works for batch queries and mounts, but streaming queries need continuous re-authentication. The ADLS token may not persist or refresh correctly, and some APIs are sensitive to the authentication context, especially in DLT pipelines, which run in a managed service context.

Troubleshooting & Solutions

1. Use Service Principal or Managed Identity

  • AAD Passthrough limitations: For streaming (Autoloader), AAD Passthrough sometimes fails because tokens are not refreshed correctly for long-running streams.

  • Best practice is configuring the DLT pipeline to use a service principal (via Spark configs) or a managed identity for Databricks. This avoids dependencies on user tokens.

2. Storage Mounts and Streaming

  • While mounting works for batch jobs, mounts are not supported for streaming with Autoloader. Always use the direct abfss:// path rather than /mnt/path for streaming.

3. Pipeline Configuration

  • Ensure these Spark configs are set, either in the notebook or DLT pipeline settings (replace placeholders):

    python
    spark.conf.set("fs.azure.account.auth.type.<STORAGE_ACCOUNT>.dfs.core.windows.net", "OAuth") spark.conf.set("fs.azure.account.oauth.provider.type.<STORAGE_ACCOUNT>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider") spark.conf.set("fs.azure.account.oauth2.client.id.<STORAGE_ACCOUNT>.dfs.core.windows.net", "<CLIENT_ID>") spark.conf.set("fs.azure.account.oauth2.client.secret.<STORAGE_ACCOUNT>.dfs.core.windows.net", "<CLIENT_SECRET>") spark.conf.set("fs.azure.account.oauth2.client.endpoint.<STORAGE_ACCOUNT>.dfs.core.windows.net", "https://login.microsoftonline.com/<TENANT_ID>/oauth2/token")
    • These configs require you to register a service principal or use Databricks managed identity, not your own credentials.

4. Autoloader Options

  • Confirm you are using abfss://container@account.dfs.core.windows.net/... in your .load() call rather than a mount path.

  • Check with your Azure admin/support to see if you can provision a service principal or managed identity for Databricks.

5. Permissions

  • You’ll need Data Contributor access to the ADLS Gen2 storage account for streaming operations via service principal or managed identity.

What You Can Do

  • Contact your Azure admin to configure a service principal or assign managed identity permissions to the Databricks workspace.

  • Update your DLT pipeline configuration to use direct ADLS Gen2 paths and appropriate credential configs.

  • Avoid relying solely on AAD passthrough for structured streaming workloads in DLT.

Additional Resources

For clarity and step-by-step configuration, review: