Yogasathyandrun
New Contributor

What you're seeing comes down to where the type mismatch is detected.

For Parquet, some mismatches can be handled at the Auto Loader layer and end up in _rescued_data, while others fail earlier inside the Parquet reader itself.

In your example, the existing schema expects a timestamp, but the new file stores the column as a plain INT64. That mismatch is detected by the Parquet reader before Auto Loader's rescue logic gets a chance to process the row, which is why you get:

FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH

instead of seeing the value in _rescued_data.

The reason a string appearing in an integer column may be rescued is that the file can still be read successfully and the mismatch is encountered during value conversion/parsing at the record level. In that case Auto Loader can route the problematic value to _rescued_data.

So the distinction is roughly:

  • Record-level parsing/conversion issue → can often be rescued into _rescued_data

  • Parquet schema/file-level incompatibility → fails during file read and cannot be rescued

For production pipelines, the common pattern is to combine:

  • cloudFiles.schemaHints for known drift-prone columns, and

  • badRecordsPath as a safety net for unexpected schema incompatibilities.

 

Data Engineer | Apache Spark | Delta Lake | Databricks

View solution in original post