โ07-22-2025 07:36 AM
I have one file that has 100 rows and in which two rows are bad data and the remaining 98 rows is good data, but when I use the bad records' path, it completely moves the file to the bad records' path, which has good data as well, and it should move only 2 bad data and 98 good data should load successfully, and also I tried option permissive mode but when we use badRecords path we cannot use any mode it seems and getting error. Please help on the same to resolve the issue.
โ07-23-2025 01:57 AM
badRecordsPath
When you enable badRecordsPath
in Autoloader or in Sparkโs file readers (with formats like CSV/JSON), hereโs what happens:
Spark expects each data file to be internally well-formed with respect to the declared schema.
If Spark encounters a fatal error while reading an entire fileโfor example, due to corrupt encoding, mismatched row/column structure, or invalid file formatโit cannot reliably parse any part of the file.
As a result, the entire file is redirected to badRecordsPath
, even if most of its content is good, because Spark cannot safely guarantee the integrity of any parsed rows from that file.
Per-record handling in badRecordsPath
only occurs if Spark can read the file but finds a few faulty rows; when the file cannot be opened or parsed at all, the whole file is marked as "bad."
Schema Mismatch: The fileโs structure doesnโt match the schema (e.g., wrong delimiter, extra/missing columns).
File Corruption: The file is truncated or not a valid CSV/JSON/Parquet file.
Encoding Errors: The fileโs encoding doesnโt match what Spark expects (e.g., UTF-8).
Header/Footer Issues: If a file has an unexpected header, footer, or partial content.
So please validate your data file for which you are facing issue and check if you see any of the issue specified above
โ07-23-2025 04:22 AM
I have already analysed the issue and, yes, the schema doesn't match one of the rows, and it moved the complete file into badRecords and I have seen the behavior and that's fine, and thanks for the response.
โ07-22-2025 08:04 AM
Hi @shan-databricks ,
have You tried with DROPMALFORMED mode?
Regarding PERMISSIVE mode - could You share a code snippet?
If thats not resolving Your issue, I would recommend using custom try except logic.
โ07-22-2025 08:19 AM
Hi @shan-databricks ,
Maybe try to read it with permissive mode and rescudedDataColumn option?
spark.read.option("mode", "PERMISSIVE").option("rescuedDataColumn", "_rescued_data").format("csv")
โ07-22-2025 09:01 AM - edited โ07-22-2025 09:01 AM
You're facing a common issue with Spark's bad records handling.
Read CSV in PERMISSIVE mode and capture corrupt rows.df = spark.read
.option("mode", "PERMISSIVE")
.option("columnNameOfCorruptRecord", "_corrupt_record")
.format("csv")
.load("s3://your-bucket/path/")
later you can filter good and bad records from df.
โ07-22-2025 11:40 PM
I am using the Autoloader features spark.readStream and writeStream, and have used the option(badRecordsPath), so when I use either Permissive or Dropmalformed or FailFast, I get an exception, like if 'badRecordsPath' is specified mode is not allowed to be set.
โ07-22-2025 11:43 PM
I am using the Autoloader features spark.readStream and writeStream, and have used the option(badRecordsPath), so when I use either Permissive or Drop malformed or FailFast, I get an exception, like if 'badRecordsPath' is specified mode is not allowed to be set.
โ07-23-2025 12:00 AM
Hi @shan-databricks , Try with below option
df = ( spark.readStream .format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("badRecordsPath", "/mnt/my-bad-records")
#.option("mode", "PERMISSIVE") # Do NOT set this!
.schema(my_schema) .load("/mnt/data") )
โ07-23-2025 12:06 AM
I am using the same in my code, but instead of moving only bad data to badRecordsPath, it is moving complete file into badRecordsPath, which has good data as well in the same file.
โ07-23-2025 01:57 AM
badRecordsPath
When you enable badRecordsPath
in Autoloader or in Sparkโs file readers (with formats like CSV/JSON), hereโs what happens:
Spark expects each data file to be internally well-formed with respect to the declared schema.
If Spark encounters a fatal error while reading an entire fileโfor example, due to corrupt encoding, mismatched row/column structure, or invalid file formatโit cannot reliably parse any part of the file.
As a result, the entire file is redirected to badRecordsPath
, even if most of its content is good, because Spark cannot safely guarantee the integrity of any parsed rows from that file.
Per-record handling in badRecordsPath
only occurs if Spark can read the file but finds a few faulty rows; when the file cannot be opened or parsed at all, the whole file is marked as "bad."
Schema Mismatch: The fileโs structure doesnโt match the schema (e.g., wrong delimiter, extra/missing columns).
File Corruption: The file is truncated or not a valid CSV/JSON/Parquet file.
Encoding Errors: The fileโs encoding doesnโt match what Spark expects (e.g., UTF-8).
Header/Footer Issues: If a file has an unexpected header, footer, or partial content.
So please validate your data file for which you are facing issue and check if you see any of the issue specified above
โ07-23-2025 04:22 AM
I have already analysed the issue and, yes, the schema doesn't match one of the rows, and it moved the complete file into badRecords and I have seen the behavior and that's fine, and thanks for the response.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now