Re: Autoloader [FAILED_READ_FILE.PARQUET_COLUMN_DA...

Yogasathyandrun · ‎06-22-2026

What you're seeing comes down to where the type mismatch is detected.

For Parquet, some mismatches can be handled at the Auto Loader layer and end up in _rescued_data, while others fail earlier inside the Parquet reader itself.

In your example, the existing schema expects a timestamp, but the new file stores the column as a plain INT64. That mismatch is detected by the Parquet reader before Auto Loader's rescue logic gets a chance to process the row, which is why you get:

FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH

instead of seeing the value in _rescued_data.

The reason a string appearing in an integer column may be rescued is that the file can still be read successfully and the mismatch is encountered during value conversion/parsing at the record level. In that case Auto Loader can route the problematic value to _rescued_data.

So the distinction is roughly:

Record-level parsing/conversion issue → can often be rescued into _rescued_data
Parquet schema/file-level incompatibility → fails during file read and cannot be rescued

For production pipelines, the common pattern is to combine:

cloudFiles.schemaHints for known drift-prone columns, and
badRecordsPath as a safety net for unexpected schema incompatibilities.

Data Engineer | Apache Spark | Delta Lake | Databricks

View solution in original post