Issues to load from ADLS in DLT

guostong · ‎05-01-2023

I am using DLT to load csv in ADLS, below is my sql query in notebook:

CREATE OR REFRESH STREAMING LIVE TABLE test_account_raw
AS SELECT * FROM cloud_files(
  "abfss://my_container@my_storageaccount.dfs.core.windows.net/test_csv/", 
  "csv", 
  map("header", "true"));

below is my configuration in Delta live table pipeline in order to access ADLS:

    "configuration": {
        "fs.azure.account.auth.type.my_storageaccount.dfs.core.windows.net": "OAuth",
        "fs.azure.account.oauth.provider.type.my_storageaccount.dfs.core.windows.net": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
        "fs.azure.account.oauth2.client.id.my_storageaccount.dfs.core.windows.net": "my_client_id",
        "fs.azure.account.oauth2.client.secret.my_storageaccount.dfs.core.windows.net": "my_secret",
        "fs.azure.account.oauth2.client.endpoint.my_storageaccount.dfs.core.windows.net": "https://login.microsoftonline.com/my_tenant_id/oauth2/token"
    }

the pipeline have below errors:

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 818323fc-80d5-4833-9f46-7d1afc9c5bf7, runId = 722e9aac-0fdd-4206-9d49-683bb151f0bf] terminated with exception: The container in the file event `{"backfill":{"bucket":"root@dbstoragelhdp7mflfxe2y","key":"5810201264315799/Data/Temp/xxxx.csv","size":1801,"eventTime":1682522202000,"newerThan$default$2":false}}` is different from expected by the source: `my_container@my_storageaccount`.
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:395)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$2(StreamExecution.scala:257)
....

How can I fix this issue?

Thanks,