cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error reading an external table - when using a serverless compute.

MRTN
Contributor

I am trying to read an external table in Databricks, created and maintained by using the delta-rs Python module. This usually works just fine, but after a recent checkpoint generation, I get the error below. However, the error only appers when reading the table with a serverless pyspark cluster, or a serverless SQL warehouse. When reading using an all purpose cluster, everything works fine. 

Any ideas for troubleshooting/mitgation?


UnknownException: (java.util.concurrent.ExecutionException) org.apache.spark.SparkException: [FAILED_READ_FILE.FAILED_CONVERT_PARQUET_COLUMN] Error while reading file abfss://REDACTED_CREDENTIALS(0b8c191f)@XXXXXX.dfs.core.windows.net/meta_context_table/_delta_log/00000000000000000100.checkpoint.parquet. Possible cause: Parquet column cannot be converted. SQLSTATE: KD001

6 REPLIES 6

Kirankumarbs
Contributor

@MRTN to understand the problem better, 

  • How are you reading this table in the Databricks?
  • Did you link this table to Unity catalog or just reading directly using abfss://***
  • What you mean by "after a recent checkpoint generation"? i.e are you sinking the data to this external location by streaming from the source?

balajij8
Contributor

The checkpoint uses a column type that serverless cant understand delta-rs compatibility issue. You can avoid delta-rs

Kirankumarbs
Contributor

delta‑rs–generated checkpoint that Spark serverless cannot deserialize because serverless use restricted/optimized parquet reader(Serverless does not allow disabling the vectorized Parquet reader)! delta‑rs uses Apache Arrow + Rust Parquet writers, which Are 100% valid Parquet, but can emit encodings that Spark serverless doesn’t accept for Delta checkpoints

Parquet column cannot be converted” typically means reader‑side incompatibility, not corruption

If this is okay/possible: Disable checkpoint creation in delta‑rs (best workaround)
Otherwise, you might have to use Job Compute cluster instead of serverless

Thanks for investigating. I guess I will have to raise an issue with the maintainers of the delta-rs library, and hope this gets resolved in the longer term. 

We are reading the table in Databricks, using the external table CREATE TABLE () LOCATION ... construct. 

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @MRTN , I did some digging and here is what I found: 

This pattern almost always means “checkpoint parquet schema that Photon doesn’t like,” not a core Delta protocol issue.

What you’re seeing lines up cleanly:

  • All-purpose cluster works because it’s using the standard Spark Parquet reader.

  • Serverless (SQL + PySpark) fails because it’s using Photon’s Parquet reader, which is stricter about schema mismatches and type conversions, and it throws FAILED_CONVERT_PARQUET_COLUMN when it encounters something inconsistent.

  • The error points directly at _delta_log/00000000000000000100.checkpoint.parquet, which means this is not your data files; it’s the checkpoint parquet that delta-rs wrote.

Given the table is maintained by delta-rs, the most likely root causes are:

  • A column type mismatch inside the checkpoint (for example, mixing INT32 and INT64, or DOUBLE and DECIMAL for the same logical field, or using a type Photon doesn’t support for that checkpoint column). Photon is less forgiving than the non-Photon reader.

  • Or a checkpoint encoding/format that your serverless runtime (Photon build) doesn’t fully support, while your all-purpose runtime happens to tolerate it.

 

Hope this helps, Louis.

 

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @MRTN,

Errors reading external tables from serverless compute are almost always caused by how serverless handles cloud storage access compared to classic compute. Serverless compute runs in a Databricks-managed compute plane, so it cannot use DBFS mounts, instance profiles, or direct cloud credentials the way classic clusters can. Instead, it requires Unity Catalog external locations and storage credentials for all external storage access.

Here is a checklist to work through:

STEP 1: CONFIRM THE EXTERNAL TABLE IS REGISTERED IN UNITY CATALOG

Serverless compute only supports Unity Catalog. If your external table is registered in the legacy Hive metastore (hive_metastore), it will not be accessible from serverless. You can verify where the table lives by running:

DESCRIBE EXTENDED catalog_name.schema_name.table_name;

Look at the "Catalog" field in the output. If it shows "hive_metastore", you will need to migrate or re-create the table in a Unity Catalog catalog.

STEP 2: VERIFY THE STORAGE CREDENTIAL EXISTS AND IS VALID

A storage credential is required to authorize access to the cloud storage location where your external table data lives. Check existing storage credentials:

SHOW STORAGE CREDENTIALS;

If no credential covers the storage path of your external table, your workspace admin needs to create one. For AWS, this is typically an IAM role. For Azure, it would be a managed identity or service principal. For GCP, a service account.

Documentation: https://docs.databricks.com/en/connect/unity-catalog/storage-credentials.html

STEP 3: VERIFY THE EXTERNAL LOCATION IS CONFIGURED

An external location maps a cloud storage path to a storage credential. Without this mapping, serverless compute has no way to reach the data. Check your external locations:

SHOW EXTERNAL LOCATIONS;

Confirm there is an external location whose URL path is a prefix of your external table's data path. You can check the table's location with:

DESCRIBE DETAIL catalog_name.schema_name.table_name;

The "location" field shows the cloud storage path. That path must fall under a registered external location.

Documentation: https://docs.databricks.com/en/connect/unity-catalog/external-locations.html

STEP 4: CHECK PERMISSIONS

Even if the external location and storage credential exist, your user (or the service principal running the job) needs the right Unity Catalog privileges:

- USE CATALOG on the parent catalog
- USE SCHEMA on the parent schema
- SELECT on the table itself

The storage credential and external location also need appropriate grants. You can check with:

SHOW GRANTS ON EXTERNAL LOCATION location_name;
SHOW GRANTS ON TABLE catalog_name.schema_name.table_name;

STEP 5: RULE OUT DBFS MOUNTS

If the table was originally created using a DBFS mount path (e.g., /mnt/my-storage/...), this will not work on serverless. DBFS mounts with AWS instance profiles are explicitly not supported on serverless compute. The table needs to be re-created using a direct cloud storage URI (e.g., s3://bucket-name/path/ or abfss://container@account.dfs.core.windows.net/path/) that is covered by an external location in Unity Catalog.

STEP 6: NETWORK CONNECTIVITY (IF APPLICABLE)

If your workspace uses private endpoints or firewall rules on the storage account, make sure the serverless compute plane has network access. Serverless compute resources run in a Databricks-managed VPC/VNet, so firewall rules that allow only your workspace VPC may block serverless. Check the documentation on serverless networking for your cloud provider:

- AWS: https://docs.databricks.com/en/compute/serverless/network-connectivity.html
- Azure: https://learn.microsoft.com/en-us/azure/databricks/compute/serverless/network-connectivity

COMMON ERROR PATTERNS

- "Access Denied" or "403 Forbidden": Usually means the storage credential's IAM role or managed identity does not have permission to the S3 bucket/ADLS container, or the trust policy does not allow the Databricks serverless account to assume it.

- "External location not found": The table's data path is not covered by any registered external location.

- "Cannot access data": The table may be in hive_metastore instead of a Unity Catalog catalog.

If you can share the exact error message you are seeing, I can narrow down the specific cause further.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.