- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12 hours ago
What you're seeing comes down to where the type mismatch is detected.
For Parquet, some mismatches can be handled at the Auto Loader layer and end up in _rescued_data, while others fail earlier inside the Parquet reader itself.
In your example, the existing schema expects a timestamp, but the new file stores the column as a plain INT64. That mismatch is detected by the Parquet reader before Auto Loader's rescue logic gets a chance to process the row, which is why you get:
FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH
instead of seeing the value in _rescued_data.
The reason a string appearing in an integer column may be rescued is that the file can still be read successfully and the mismatch is encountered during value conversion/parsing at the record level. In that case Auto Loader can route the problematic value to _rescued_data.
So the distinction is roughly:
Record-level parsing/conversion issue → can often be rescued into _rescued_data
Parquet schema/file-level incompatibility → fails during file read and cannot be rescued
For production pipelines, the common pattern is to combine:
cloudFiles.schemaHints for known drift-prone columns, and
badRecordsPath as a safety net for unexpected schema incompatibilities.