- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
13 hours ago
The _rescued_data column in Auto Loader works for JSON and CSV formats - not Parquet. Parquet is a strongly typed format where data types are encoded in the file metadata. When you have a timestamp column that becomes INT64 in a new file, it creates a file-format-level incompatibility that occurs during the Parquet reader initialization before Auto Loader's schema evolution or rescued data logic chip in.
FAILED_READ_FILE.PARQUET_COLUMN_DATA_TYPE_MISMATCH: Expected Spark type timestamp, actual Parquet type INT64 is generally from the low level Parquet reader when it detects the metadata mismatch.
In schemaEvolutionMode: addNewColumnsWithTypeWidening - It handles widening (int to long) but timestamp to INT64 is not widening. It's an incompatible change
rescuedDataColumn - Only rescues data for JSON/CSV where type mismatches are detected during parsing, not for Parquet format-level conflicts
You can use badRecordsPath for Parquet files with incompatible type changes. It catches file-level read failures and allows the stream to continue while logging the error files.