Issues to load from ADLS in DLT
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-01-2023 08:34 AM
I am using DLT to load csv in ADLS, below is my sql query in notebook:
CREATE OR REFRESH STREAMING LIVE TABLE test_account_raw
AS SELECT * FROM cloud_files(
"abfss://my_container@my_storageaccount.dfs.core.windows.net/test_csv/",
"csv",
map("header", "true"));below is my configuration in Delta live table pipeline in order to access ADLS:
"configuration": {
"fs.azure.account.auth.type.my_storageaccount.dfs.core.windows.net": "OAuth",
"fs.azure.account.oauth.provider.type.my_storageaccount.dfs.core.windows.net": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id.my_storageaccount.dfs.core.windows.net": "my_client_id",
"fs.azure.account.oauth2.client.secret.my_storageaccount.dfs.core.windows.net": "my_secret",
"fs.azure.account.oauth2.client.endpoint.my_storageaccount.dfs.core.windows.net": "https://login.microsoftonline.com/my_tenant_id/oauth2/token"
}the pipeline have below errors:
org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 818323fc-80d5-4833-9f46-7d1afc9c5bf7, runId = 722e9aac-0fdd-4206-9d49-683bb151f0bf] terminated with exception: The container in the file event `{"backfill":{"bucket":"root@dbstoragelhdp7mflfxe2y","key":"5810201264315799/Data/Temp/xxxx.csv","size":1801,"eventTime":1682522202000,"newerThan$default$2":false}}` is different from expected by the source: `my_container@my_storageaccount`.
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:395)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.$anonfun$run$2(StreamExecution.scala:257)
....How can I fix this issue?
Thanks,