Databricks Community

Jothia · ‎11-20-2025

Hi All ,

We are facing issues while reading Storage account where stream data from data verse in Unity catalog through External table but not every time . It was running fine with hive

An error occurred while calling o393.sql.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 84.0 failed 4 times, most recent failure: Lost task 0.3 in stage 84.0 (TID 171) (10.152.188.38 executor 0): com.databricks.sql.io.FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART@storage002.dfs.core.windows.net/account/2019-06.csv.

Any suggestion here ?

Regards,

Jothi.

Raman_Unifeye · ‎11-21-2025

2 Things to check

One:

double-check that you are not trying to authenticate with two different methods (e.g., a cluster credential trying to override the Unity Catalog creds)

The previous Hive setup likely relied on a Cluster-Scoped Service Principal or Shared Access Signature (SAS) key configured directly in the cluster's Spark configuration (e.g., spark.hadoop.fs.azure.account.auth.type). Unity Catalog ignores these cluster-scoped secrets for paths defined in its External Locations. If the table is an External Table managed by Unity Catalog, you must rely on the credentials defined in the External Location.

Two:

Are you not using Autloader?

If the Dataverse stream creates many very small files or is currently in the process of writing/overwriting a file when Spark tries to read it, it can cause transient read failures.

Use Auto Loader if possible, as they handle file discovery and eventual consistency better.

RG #Driving Business Outcomes with Data Intelligence

Jothia · ‎11-21-2025

@Raman_Unifeye Thanks your response. We are using external location path under UC only in the External table. Looks no issues with authentication as well we are not getting error always.

Ashwin_DSA · 4 weeks ago

Hi @Jothia,

Apologies for the very delayed response here. Appreciate this was raised in November 2025. I wanted to close the loop in case you were still expecting an answer.

From what you described, this does not look like a straightforward Unity Catalog authentication issue. With Unity Catalog external tables, Databricks governs access to the table metadata and permissions, but the underlying files still live in your cloud storage, and Unity Catalog does not manage the lifecycle or layout of those files. Databricks also generally recommends managed tables over external tables for most workloads because they are operationally more robust and benefit from more built-in optimisations.

The crucial detail in your case is that the failure is intermittent. If this were a permissions issue or an external location configuration issue, I would expect it to fail consistently. When a query sometimes succeeds and sometimes throws com.databricks.sql.io.FileReadException against an abfss://...csv file, that usually points more to the underlying file being replaced, updated, or temporarily unavailable while Spark is reading it, rather than Unity Catalog itself being unable to authorise access.

So my guess is that Unity Catalog is probably surfacing an issue in the storage or file update pattern, rather than being the root cause on its own. Hive may have appeared to work better before, but UC external tables still ultimately read the same files from ADLS, and raw external files are simply a less robust approach when the source system is actively updating them.

If this use case is still active, there are now better patterns for bringing Dataverse data into Databricks. The most purpose-built option is the managed Microsoft Dynamics 365 / Dataverse connector in Lakeflow Connect, which is designed to work with Azure Synapse Link for Dataverse. That connector reads the Dataverse exports from ADLS, uses the changelog metadata for incremental ingestion, and lands the data into Databricks tables instead of leaving you to query the raw exported CSV files directly. Databricks also documents the current limitations and operational considerations here.

If you want to stay with a file-based pattern in ADLS, then Auto Loader is also a much better fit than querying live CSV exports through an external table. Auto Loader is designed to incrementally ingest new files from cloud storage, supports ADLS paths, and tracks ingestion progress with checkpoint state so that downstream users query Delta tables instead of reading directly from raw files that may still be changing. Databricks also notes that Auto Loader works best with immutable arriving files, which is exactly why it is a stronger ingestion pattern here than reading live CSV exports in place.

In other words, if the current approach is "Dataverse exports CSV to ADLS, and Databricks queries those CSVs in place through a UC external table," I would treat that as a stopgap. The more robust long-term design is to ingest that data into Delta-managed tables, ideally via Lakeflow Connect for Dynamics 365/Dataverse, or otherwise through an Auto Loader-based ingestion pipeline before downstream users query it.

If you do need to stay on the current pattern for now, I would check whether the CSV files are ever overwritten in place, whether Synapse Link or another upstream process can still be writing into the same folder while queries are running, and whether the issue disappears when reading a static snapshot of the export rather than the live path. If the issue goes away on a static snapshot, that is a strong signal that the problem is with concurrent file changes rather than Unity Catalog configuration.

Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

naveenayalla · 4 weeks ago

one of these:

External location path doesn't exactly match your ABFS path in Unity Catalog

Storage credential missing Storage Blob Data Contributor on that container

Service principal token expiring mid-job

Table LOCATION path changed slightly during migration, Start with external location and storage credential in Unity Catalog — that's the 90% case here.

ashukasma · 3 weeks ago

This issue appears to be related to Azure Storage access through Unity Catalog rather than the data itself, especially since the same workload was working fine with Hive and the failure is intermittent.

A few areas worth checking:

1. Storage Credential Configuration
Verify that the Storage Credential and External Location used by Unity Catalog are configured correctly.
If you're using a Service Principal, ensure the client secret has not expired and has the required permissions on the storage account.
2. Transient Authentication or Token Refresh Issues
Intermittent failures can sometimes occur due to token refresh or credential caching issues within Unity Catalog.
Check if the failures happen after long-running cluster sessions.
3. Azure Storage Throttling or Network Connectivity
Review Azure Storage metrics for throttling, timeouts, or connectivity-related errors.
Verify whether the issue affects random files or specific files only.
4. File-Level Validation
Since the error references a specific CSV file, confirm that the file is not corrupted and can be accessed directly from the storage account.

Aashish Kasma | CTO & Cofounder, Lucent Innovation

Databricks Community

Databricks Access Issue with UC

🌟 Community Pulse: Your Weekly Roundup! June 22 – 28, 2026

Solution Accelerator Series | Product Quality Inspection

Upcoming Community BrickTalk: Bringing (Geo)Spatial Awareness to your Conversational Agents

Databricks Community Champion - June 2026 - Amira Bedhiafi

Build apps without jumping through hoops