topic Re: Databricks Autoloader BadRecords path Issue in Data Engineering

Databricks Autoloader BadRecords path Issue

shan-databricks — Tue, 22 Jul 2025 14:36:05 GMT

I have one file that has 100 rows and in which two rows are bad data and the remaining 98 rows is good data, but when I use the bad records' path, it completely moves the file to the bad records' path, which has good data as well, and it should move only 2 bad data and 98 good data should load successfully, and also I tried option permissive mode but when we use badRecords path we cannot use any mode it seems and getting error. Please help on the same to resolve the issue.

Re: Databricks Autoloader BadRecords path Issue

radothede — Tue, 22 Jul 2025 15:04:10 GMT

Hi @shan-databricks ,

have You tried with DROPMALFORMED mode?

Regarding PERMISSIVE mode - could You share a code snippet?

If thats not resolving Your issue, I would recommend using custom try except logic.

Re: Databricks Autoloader BadRecords path Issue

szymon_dybczak — Tue, 22 Jul 2025 15:19:04 GMT

Hi @shan-databricks ,

Maybe try to read it with permissive mode and rescudedDataColumn option?

spark.read.option("mode", "PERMISSIVE").option("rescuedDataColumn", "_rescued_data").format("csv")

Re: Databricks Autoloader BadRecords path Issue

lingareddy_Alva — Tue, 22 Jul 2025 16:01:41 GMT

Hi @shan-databricks

You're facing a common issue with Spark's bad records handling.

Read CSV in PERMISSIVE mode and capture corrupt rows.df = spark.read
.option("mode", "PERMISSIVE")
.option("columnNameOfCorruptRecord", "_corrupt_record")
.format("csv")
.load("s3://your-bucket/path/")

later you can filter good and bad records from df.

Re: Databricks Autoloader BadRecords path Issue

shan-databricks — Wed, 23 Jul 2025 06:40:27 GMT

I am using the Autoloader features spark.readStream and writeStream, and have used the option(badRecordsPath), so when I use either Permissive or Dropmalformed or FailFast, I get an exception, like if 'badRecordsPath' is specified mode is not allowed to be set.

Re: Databricks Autoloader BadRecords path Issue

shan-databricks — Wed, 23 Jul 2025 06:43:08 GMT

I am using the Autoloader features spark.readStream and writeStream, and have used the option(badRecordsPath), so when I use either Permissive or Drop malformed or FailFast, I get an exception, like if 'badRecordsPath' is specified mode is not allowed to be set.

Re: Databricks Autoloader BadRecords path Issue

ShaileshBobay — Wed, 23 Jul 2025 07:00:48 GMT

Hi @shan-databricks , Try with below option

df = ( spark.readStream .format("cloudFiles")

.option("cloudFiles.format", "csv")

.option("badRecordsPath", "/mnt/my-bad-records")

#.option("mode", "PERMISSIVE") # Do NOT set this!

.schema(my_schema) .load("/mnt/data") )

Re: Databricks Autoloader BadRecords path Issue

shan-databricks — Wed, 23 Jul 2025 07:06:44 GMT

I am using the same in my code, but instead of moving only bad data to badRecordsPath, it is moving complete file into badRecordsPath, which has good data as well in the same file.

Re: Databricks Autoloader BadRecords path Issue

ShaileshBobay — Wed, 23 Jul 2025 08:57:57 GMT

Why Entire Files Go to `badRecordsPath`

When you enable badRecordsPath in Autoloader or in Spark’s file readers (with formats like CSV/JSON), here’s what happens:

Spark expects each data file to be internally well-formed with respect to the declared schema.
If Spark encounters a fatal error while reading an entire file—for example, due to corrupt encoding, mismatched row/column structure, or invalid file format—it cannot reliably parse any part of the file.
As a result, the entire file is redirected to badRecordsPath, even if most of its content is good, because Spark cannot safely guarantee the integrity of any parsed rows from that file.
Per-record handling in badRecordsPath only occurs if Spark can read the file but finds a few faulty rows; when the file cannot be opened or parsed at all, the whole file is marked as "bad."

Typical Root Causes

Schema Mismatch: The file’s structure doesn’t match the schema (e.g., wrong delimiter, extra/missing columns).
File Corruption: The file is truncated or not a valid CSV/JSON/Parquet file.
Encoding Errors: The file’s encoding doesn’t match what Spark expects (e.g., UTF-8).
Header/Footer Issues: If a file has an unexpected header, footer, or partial content.

So please validate your data file for which you are facing issue and check if you see any of the issue specified above

Re: Databricks Autoloader BadRecords path Issue

shan-databricks — Wed, 23 Jul 2025 11:22:43 GMT

I have already analysed the issue and, yes, the schema doesn't match one of the rows, and it moved the complete file into badRecords and I have seen the behavior and that's fine, and thanks for the response.

topic Re: Databricks Autoloader BadRecords path Issue in Data Engineering

Databricks Autoloader BadRecords path Issue

Re: Databricks Autoloader BadRecords path Issue

Re: Databricks Autoloader BadRecords path Issue

Re: Databricks Autoloader BadRecords path Issue

Re: Databricks Autoloader BadRecords path Issue

Re: Databricks Autoloader BadRecords path Issue

Re: Databricks Autoloader BadRecords path Issue

Re: Databricks Autoloader BadRecords path Issue

Re: Databricks Autoloader BadRecords path Issue

Why Entire Files Go to badRecordsPath

Typical Root Causes

Re: Databricks Autoloader BadRecords path Issue

Why Entire Files Go to `badRecordsPath`