- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2025 01:48 PM
We've been using auto loader to ingest data from a storage account on Azure (format "cloudFiles").
Today, we're starting to see failures during the setup of event notification:
25/09/11 19:06:28 ERROR MicroBatchExecution: Non-interrupted exception thrown for queryId=[REDACTED],runId=[REDACTED]: org.json4s.MappingException: Do not know how to convert JArray(List(JString([REDACTED]))) into class java.lang.String
org.json4s.MappingException: Do not know how to convert JArray(List(JString([REDACTED]))) into class java.lang.String
at org.json4s.reflect.package$.fail(package.scala:53)
at org.json4s.Extraction$.convert(Extraction.scala:888)
at org.json4s.Extraction$.$anonfun$extract$10(Extraction.scala:456)
at org.json4s.Extraction$.$anonfun$customOrElse$1(Extraction.scala:780)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
at scala.PartialFunction$$anon$1.applyOrElse(PartialFunction.scala:257)
at org.json4s.Extraction$.customOrElse(Extraction.scala:780)
at org.json4s.Extraction$.extract(Extraction.scala:454)
at org.json4s.Extraction$.org$json4s$Extraction$$extractDetectingNonTerminal(Extraction.scala:482)
at org.json4s.Extraction$.$anonfun$extract$8(Extraction.scala:426)
at scala.collection.immutable.List.map(List.scala:297)
at org.json4s.Extraction$.$anonfun$extract$7(Extraction.scala:424)
at org.json4s.Extraction$.$anonfun$customOrElse$1(Extraction.scala:780)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
at scala.PartialFunction$$anon$1.applyOrElse(PartialFunction.scala:257)
at org.json4s.Extraction$.customOrElse(Extraction.scala:780)
at org.json4s.Extraction$.extract(Extraction.scala:420)
at org.json4s.Extraction$.extract(Extraction.scala:56)
at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:22)
at org.json4s.jackson.JacksonSerialization.read(Serialization.scala:62)
at org.json4s.Serialization.read(Serialization.scala:31)
at org.json4s.Serialization.read$(Serialization.scala:31)
at org.json4s.jackson.JacksonSerialization.read(Serialization.scala:23)
at com.databricks.sql.aqs.EventGridClient.generateAccessTokenUsingClientSecret(EventGridClient.scala:180)
at com.databricks.sql.aqs.EventGridClient.generateAccessToken(EventGridClient.scala:238)
at com.databricks.sql.aqs.autoIngest.AzureEventNotificationSetup$.getToken(AzureEventNotificationSetup.scala:345)
at com.databricks.sql.aqs.autoIngest.AzureEventNotificationSetup$.$anonfun$buildStorageClient$2(AzureEventNotificationSetup.scala:387)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.sql.aqs.autoIngest.AzureEventNotificationSetup$.buildStorageClient(AzureEventNotificationSetup.scala:384)
at com.databricks.sql.aqs.autoIngest.AzureEventNotificationSetup.<init>(AzureEventNotificationSetup.scala:70)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at com.databricks.sql.fileNotification.autoIngest.EventNotificationSetup$.$anonfun$create$1(EventNotificationSetup.scala:68)
at com.databricks.sql.fileNotification.autoIngest.ResourceManagementUtils$.unwrapInvocationTargetException(ResourceManagementUtils.scala:42)
at com.databricks.sql.fileNotification.autoIngest.EventNotificationSetup$.create(EventNotificationSetup.scala:50)
at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceProvider.$anonfun$createSource$2(CloudFilesSourceProvider.scala:143)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceProvider.createSource(CloudFilesSourceProvider.scala:128)
at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:346)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$2.$anonfun$applyOrElse$2(MicroBatchExecution.scala:223)
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2025 03:46 PM
Hello @Malthe
Thank you so much for sharing the error:
One of the error msg which drew my attention is :
.EventGridClient.generateAccessTokenUsingClientSecret
- Ensure your service principal has the minimum required Azure RBAC roles (these are not app roles in Azure AD; they are resource-level permissions):RoleScopePurposeStorage Blob Data ContributorStorage accountRead/write blobs for file discovery.Storage Queue Data ContributorStorage accountManage queues for notifications (if not using connection string).EventGrid EventSubscription ContributorResource group (or subscription)Create/read/delete Event Grid subscriptions.ContributorStorage account and resource groupGeneral setup (broader; use if custom roles fail).
- Assign these via Azure Portal > Storage Account/Resource Group > Access Control (IAM) > Add role assignment > Select service principal.
Also, remove unnecessary app role assignments (likely root cause)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2025 09:34 PM - edited 09-11-2025 09:36 PM
These are the current role assignments of this service principal:
Seems to be right and also:
- This is just an intermittent error;
- There's an event subscription on the storage queue (with a matching query id from the error message).
Could it be that somehow the Azure Management Endpoint for the event grid is returning a different kind of response all of a sudden? This is a traceback from Databricks' own integration code, so there isn't much to go on here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2025 03:06 AM
Im having exactly the same issue - with multiple pipelines in 3 different environments, starting approximately 11.9.2025 10.00 EEST.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2025 01:15 PM
Hi, We’re seeing the same issue on several queue based ingestion jobs failing a couple hundreds of tasks. It was intermittent yesterday (10 Sep 2025) as in a few random tasks would fail in each run but the issue has now spread out to all tasks failing all of them at all runs. I’ve given the service principal all the roles suggested above but to no avail. I suspect it could have to do with a change in the Azure Event Grid response structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-12-2025 01:24 PM - edited 09-12-2025 01:25 PM
I'm seeing these two updates from Microsoft on 10 Sep 2025:
- Azure Storage APIs gain Entra ID and RBAC support
- Automatic Identity Management (AIM) for Entra ID on Azure Databricks
Both seem like candidates.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2025 05:08 AM
Falling back to file listing mode worked as band-aid, but dont see that as long term solution due to costs related to calling file listing operations (especialy with large number of files).
In practice - i removed the event grid related options from the streamreader:
cloudFiles.useNotifications
cloudFiles.resourceGroup
cloudFiles.subscriptionId
cloudFiles.clientId
cloudFiles.clientSecret
cloudFiles.tenantId
My top candidate would then be the Azure Storage API changes shared by @Malthe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2025 05:19 AM
The problem seems to have resolved itself today on our platform. did you try the previous ingestion again (before using the workaround)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2025 05:28 AM
Thanks! Interesting - now it works in our platform too..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2025 05:28 AM
That is - I revert back to eventgrid mode and it works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2025 11:49 AM
Hello @Malthe @Saska @MehdiJafari
When it was showing such error, did you try any alternative methods like using databricks service credentials.
For example
stream = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("databricks.serviceCredential", "my-storage-credential")
.option("cloudFiles.useNotifications", "true")
.option("cloudFiles.resourceGroup", "my-rg")
.option("cloudFiles.subscriptionId", "my-sub-id")
.load("abfss://container@storageaccount.dfs.core.windows.net/path/")