cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Access Issue with UC

Jothia
New Contributor III

Hi All ,

We are facing issues while reading Storage account where stream data from data verse in Unity catalog through External table but not every time . It was running fine with hive

An error occurred while calling o393.sql.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 84.0 failed 4 times, most recent failure: Lost task 0.3 in stage 84.0 (TID 171) (10.152.188.38 executor 0): com.databricks.sql.io.FileReadException: Error while reading file abfss:REDACTED_LOCAL_PART@storage002.dfs.core.windows.net/account/2019-06.csv.

 

Any suggestion here ?

Regards,

Jothi.

 

3 REPLIES 3

Raman_Unifeye
Honored Contributor III

2 Things to check 

One:

double-check that you are not trying to authenticate with two different methods (e.g., a cluster credential trying to override the Unity Catalog creds)

The previous Hive setup likely relied on a Cluster-Scoped Service Principal or Shared Access Signature (SAS) key configured directly in the cluster's Spark configuration (e.g., spark.hadoop.fs.azure.account.auth.type). Unity Catalog ignores these cluster-scoped secrets for paths defined in its External Locations. If the table is an External Table managed by Unity Catalog, you must rely on the credentials defined in the External Location.

Two:

Are you not using Autloader?

If the Dataverse stream creates many very small files or is currently in the process of writing/overwriting a file when Spark tries to read it, it can cause transient read failures.

Use Auto Loader if possible, as they handle file discovery and eventual consistency better.

 

 


RG #Driving Business Outcomes with Data Intelligence

Jothia
New Contributor III

@Raman_Unifeye  Thanks your response. We are using external location path under UC only in the External table. Looks no issues with authentication as well we are not  getting error always.

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Jothia,

Apologies for the very delayed response here. Appreciate this was raised in November 2025. I wanted to close the loop in case you were still expecting an answer.

From what you described, this does not look like a straightforward Unity Catalog authentication issue. With Unity Catalog external tables, Databricks governs access to the table metadata and permissions, but the underlying files still live in your cloud storage, and Unity Catalog does not manage the lifecycle or layout of those files. Databricks also generally recommends managed tables over external tables for most workloads because they are operationally more robust and benefit from more built-in optimisations.

The crucial detail in your case is that the failure is intermittent. If this were a permissions issue or an external location configuration issue, I would expect it to fail consistently. When a query sometimes succeeds and sometimes throws com.databricks.sql.io.FileReadException against an abfss://...csv file, that usually points more to the underlying file being replaced, updated, or temporarily unavailable while Spark is reading it, rather than Unity Catalog itself being unable to authorise access.

So my guess is that Unity Catalog is probably surfacing an issue in the storage or file update pattern, rather than being the root cause on its own. Hive may have appeared to work better before, but UC external tables still ultimately read the same files from ADLS, and raw external files are simply a less robust approach when the source system is actively updating them.

If this use case is still active, there are now better patterns for bringing Dataverse data into Databricks. The most purpose-built option is the managed Microsoft Dynamics 365 / Dataverse connector in Lakeflow Connect, which is designed to work with Azure Synapse Link for Dataverse. That connector reads the Dataverse exports from ADLS, uses the changelog metadata for incremental ingestion, and lands the data into Databricks tables instead of leaving you to query the raw exported CSV files directly. Databricks also documents the current limitations and operational considerations here.

If you want to stay with a file-based pattern in ADLS, then Auto Loader is also a much better fit than querying live CSV exports through an external table. Auto Loader is designed to incrementally ingest new files from cloud storage, supports ADLS paths, and tracks ingestion progress with checkpoint state so that downstream users query Delta tables instead of reading directly from raw files that may still be changing. Databricks also notes that Auto Loader works best with immutable arriving files, which is exactly why it is a stronger ingestion pattern here than reading live CSV exports in place.

In other words, if the current approach is "Dataverse exports CSV to ADLS, and Databricks queries those CSVs in place through a UC external table," I would treat that as a stopgap. The more robust long-term design is to ingest that data into Delta-managed tables, ideally via Lakeflow Connect for Dynamics 365/Dataverse, or otherwise through an Auto Loader-based ingestion pipeline before downstream users query it.

If you do need to stay on the current pattern for now, I would check whether the CSV files are ever overwritten in place, whether Synapse Link or another upstream process can still be writing into the same folder while queries are running, and whether the issue disappears when reading a static snapshot of the export rather than the live path. If the issue goes away on a static snapshot, that is a strong signal that the problem is with concurrent file changes rather than Unity Catalog configuration.

Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***