cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Intermittent failures with auto loader on Azure

Malthe
Contributor

We've been using auto loader to ingest data from a storage account on Azure (format "cloudFiles").

Today, we're starting to see failures during the setup of event notification:

25/09/11 19:06:28 ERROR MicroBatchExecution: Non-interrupted exception thrown for queryId=[REDACTED],runId=[REDACTED]: org.json4s.MappingException: Do not know how to convert JArray(List(JString([REDACTED]))) into class java.lang.String
org.json4s.MappingException: Do not know how to convert JArray(List(JString([REDACTED]))) into class java.lang.String
	at org.json4s.reflect.package$.fail(package.scala:53)
	at org.json4s.Extraction$.convert(Extraction.scala:888)
	at org.json4s.Extraction$.$anonfun$extract$10(Extraction.scala:456)
	at org.json4s.Extraction$.$anonfun$customOrElse$1(Extraction.scala:780)
	at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
	at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
	at scala.PartialFunction$$anon$1.applyOrElse(PartialFunction.scala:257)
	at org.json4s.Extraction$.customOrElse(Extraction.scala:780)
	at org.json4s.Extraction$.extract(Extraction.scala:454)
	at org.json4s.Extraction$.org$json4s$Extraction$$extractDetectingNonTerminal(Extraction.scala:482)
	at org.json4s.Extraction$.$anonfun$extract$8(Extraction.scala:426)
	at scala.collection.immutable.List.map(List.scala:297)
	at org.json4s.Extraction$.$anonfun$extract$7(Extraction.scala:424)
	at org.json4s.Extraction$.$anonfun$customOrElse$1(Extraction.scala:780)
	at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
	at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
	at scala.PartialFunction$$anon$1.applyOrElse(PartialFunction.scala:257)
	at org.json4s.Extraction$.customOrElse(Extraction.scala:780)
	at org.json4s.Extraction$.extract(Extraction.scala:420)
	at org.json4s.Extraction$.extract(Extraction.scala:56)
	at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:22)
	at org.json4s.jackson.JacksonSerialization.read(Serialization.scala:62)
	at org.json4s.Serialization.read(Serialization.scala:31)
	at org.json4s.Serialization.read$(Serialization.scala:31)
	at org.json4s.jackson.JacksonSerialization.read(Serialization.scala:23)
	at com.databricks.sql.aqs.EventGridClient.generateAccessTokenUsingClientSecret(EventGridClient.scala:180)
	at com.databricks.sql.aqs.EventGridClient.generateAccessToken(EventGridClient.scala:238)
	at com.databricks.sql.aqs.autoIngest.AzureEventNotificationSetup$.getToken(AzureEventNotificationSetup.scala:345)
	at com.databricks.sql.aqs.autoIngest.AzureEventNotificationSetup$.$anonfun$buildStorageClient$2(AzureEventNotificationSetup.scala:387)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.sql.aqs.autoIngest.AzureEventNotificationSetup$.buildStorageClient(AzureEventNotificationSetup.scala:384)
	at com.databricks.sql.aqs.autoIngest.AzureEventNotificationSetup.<init>(AzureEventNotificationSetup.scala:70)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
	at com.databricks.sql.fileNotification.autoIngest.EventNotificationSetup$.$anonfun$create$1(EventNotificationSetup.scala:68)
	at com.databricks.sql.fileNotification.autoIngest.ResourceManagementUtils$.unwrapInvocationTargetException(ResourceManagementUtils.scala:42)
	at com.databricks.sql.fileNotification.autoIngest.EventNotificationSetup$.create(EventNotificationSetup.scala:50)
	at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceProvider.$anonfun$createSource$2(CloudFilesSourceProvider.scala:143)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceProvider.createSource(CloudFilesSourceProvider.scala:128)
	at org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:346)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$2.$anonfun$applyOrElse$2(MicroBatchExecution.scala:223)
...
1 ACCEPTED SOLUTION

Accepted Solutions

Saska
New Contributor

Falling back to file listing mode worked as band-aid, but dont see that as long term solution due to costs related to calling file listing operations (especialy with large number of files).

In practice - i removed the event grid related options from the streamreader:

cloudFiles.useNotifications
cloudFiles.resourceGroup
cloudFiles.subscriptionId
cloudFiles.clientId
cloudFiles.clientSecret
cloudFiles.tenantId

My top candidate would then be the Azure Storage API changes shared by @Malthe 

View solution in original post

10 REPLIES 10

Khaja_Zaffer
Contributor

Hello  @Malthe 

Thank you so much for sharing the error: 
One of the error msg which drew my attention is : 

.EventGridClient.generateAccessTokenUsingClientSecret

 

Can you please verify Service Principal Permissions:
  • Ensure your service principal has the minimum required Azure RBAC roles (these are not app roles in Azure AD; they are resource-level permissions):
    RoleScopePurpose
    Storage Blob Data Contributor
    Storage account
    Read/write blobs for file discovery.
    Storage Queue Data Contributor
    Storage account
    Manage queues for notifications (if not using connection string).
    EventGrid EventSubscription Contributor
    Resource group (or subscription)
    Create/read/delete Event Grid subscriptions.
    Contributor
    Storage account and resource group
    General setup (broader; use if custom roles fail).
  • Assign these via Azure Portal > Storage Account/Resource Group > Access Control (IAM) > Add role assignment > Select service principal.

Also, remove unnecessary app role assignments (likely root cause)

Malthe
Contributor

These are the current role assignments of this service principal:

Malthe_0-1757651440598.png

Seems to be right and also:

  1. This is just an intermittent error;
  2. There's an event subscription on the storage queue (with a matching query id from the error message).

Could it be that somehow the Azure Management Endpoint for the event grid is returning a different kind of response all of a sudden? This is a traceback from Databricks' own integration code, so there isn't much to go on here.

Saska
New Contributor

Im having exactly the same issue - with multiple pipelines in 3 different environments, starting approximately 11.9.2025 10.00 EEST.

MehdiJafari
New Contributor

Hi, We’re seeing the same issue on several queue based ingestion jobs failing a couple hundreds of tasks. It was intermittent yesterday (10 Sep 2025) as in a few random tasks would fail in each run but the issue has now spread out to all tasks failing all of them at all runs. I’ve given the service principal all the roles suggested above but to no avail. I suspect it could have to do with a change in the Azure Event Grid response structure.

Malthe
Contributor

I'm seeing these two updates from Microsoft on 10 Sep 2025:

Both seem like candidates.

Saska
New Contributor

Falling back to file listing mode worked as band-aid, but dont see that as long term solution due to costs related to calling file listing operations (especialy with large number of files).

In practice - i removed the event grid related options from the streamreader:

cloudFiles.useNotifications
cloudFiles.resourceGroup
cloudFiles.subscriptionId
cloudFiles.clientId
cloudFiles.clientSecret
cloudFiles.tenantId

My top candidate would then be the Azure Storage API changes shared by @Malthe 

The problem seems to have resolved itself today on our platform. did you try the previous ingestion again (before using the workaround)?

Thanks! Interesting - now it works in our platform too..

Saska
New Contributor

That is - I revert back to eventgrid mode and it works.

Khaja_Zaffer
Contributor

Hello @Malthe @Saska @MehdiJafari

When it was showing such error, did you try any alternative methods like using databricks service credentials. 

For example 

 

stream = spark.readStream.format("cloudFiles")

  .option("cloudFiles.format", "json")

  .option("databricks.serviceCredential", "my-storage-credential")

  .option("cloudFiles.useNotifications", "true")

  .option("cloudFiles.resourceGroup", "my-rg")

  .option("cloudFiles.subscriptionId", "my-sub-id")

  .load("abfss://container@storageaccount.dfs.core.windows.net/path/")