Databricks Community

shan-databricks · ‎07-22-2025

I have one file that has 100 rows and in which two rows are bad data and the remaining 98 rows is good data, but when I use the bad records' path, it completely moves the file to the bad records' path, which has good data as well, and it should move only 2 bad data and 98 good data should load successfully, and also I tried option permissive mode but when we use badRecords path we cannot use any mode it seems and getting error. Please help on the same to resolve the issue.

ShaileshBobay · ‎07-23-2025

Why Entire Files Go to `badRecordsPath`

When you enable badRecordsPath in Autoloader or in Spark’s file readers (with formats like CSV/JSON), here’s what happens:

Spark expects each data file to be internally well-formed with respect to the declared schema.
If Spark encounters a fatal error while reading an entire file—for example, due to corrupt encoding, mismatched row/column structure, or invalid file format—it cannot reliably parse any part of the file.
As a result, the entire file is redirected to badRecordsPath, even if most of its content is good, because Spark cannot safely guarantee the integrity of any parsed rows from that file.
Per-record handling in badRecordsPath only occurs if Spark can read the file but finds a few faulty rows; when the file cannot be opened or parsed at all, the whole file is marked as "bad."

Typical Root Causes

Schema Mismatch: The file’s structure doesn’t match the schema (e.g., wrong delimiter, extra/missing columns).
File Corruption: The file is truncated or not a valid CSV/JSON/Parquet file.
Encoding Errors: The file’s encoding doesn’t match what Spark expects (e.g., UTF-8).
Header/Footer Issues: If a file has an unexpected header, footer, or partial content.

So please validate your data file for which you are facing issue and check if you see any of the issue specified above

View solution in original post

shan-databricks · ‎07-23-2025

I have already analysed the issue and, yes, the schema doesn't match one of the rows, and it moved the complete file into badRecords and I have seen the behavior and that's fine, and thanks for the response.

View solution in original post

radothede · ‎07-22-2025

Hi @shan-databricks ,

have You tried with DROPMALFORMED mode?

Regarding PERMISSIVE mode - could You share a code snippet?

If thats not resolving Your issue, I would recommend using custom try except logic.

szymon_dybczak · ‎07-22-2025

Hi @shan-databricks ,

Maybe try to read it with permissive mode and rescudedDataColumn option?

spark.read.option("mode", "PERMISSIVE").option("rescuedDataColumn", "_rescued_data").format("csv")

lingareddy_Alva · ‎07-22-2025

Hi @shan-databricks

You're facing a common issue with Spark's bad records handling.

Read CSV in PERMISSIVE mode and capture corrupt rows.df = spark.read
.option("mode", "PERMISSIVE")
.option("columnNameOfCorruptRecord", "_corrupt_record")
.format("csv")
.load("s3://your-bucket/path/")

later you can filter good and bad records from df.

LR

shan-databricks · ‎07-22-2025

I am using the Autoloader features spark.readStream and writeStream, and have used the option(badRecordsPath), so when I use either Permissive or Dropmalformed or FailFast, I get an exception, like if 'badRecordsPath' is specified mode is not allowed to be set.

shan-databricks · ‎07-22-2025

I am using the Autoloader features spark.readStream and writeStream, and have used the option(badRecordsPath), so when I use either Permissive or Drop malformed or FailFast, I get an exception, like if 'badRecordsPath' is specified mode is not allowed to be set.

ShaileshBobay · ‎07-23-2025

Hi @shan-databricks , Try with below option

df = ( spark.readStream .format("cloudFiles")

.option("cloudFiles.format", "csv")

.option("badRecordsPath", "/mnt/my-bad-records")

#.option("mode", "PERMISSIVE") # Do NOT set this!

.schema(my_schema) .load("/mnt/data") )

shan-databricks · ‎07-23-2025

I am using the same in my code, but instead of moving only bad data to badRecordsPath, it is moving complete file into badRecordsPath, which has good data as well in the same file.

ShaileshBobay · ‎07-23-2025

Why Entire Files Go to `badRecordsPath`

When you enable badRecordsPath in Autoloader or in Spark’s file readers (with formats like CSV/JSON), here’s what happens:

Spark expects each data file to be internally well-formed with respect to the declared schema.
If Spark encounters a fatal error while reading an entire file—for example, due to corrupt encoding, mismatched row/column structure, or invalid file format—it cannot reliably parse any part of the file.
As a result, the entire file is redirected to badRecordsPath, even if most of its content is good, because Spark cannot safely guarantee the integrity of any parsed rows from that file.
Per-record handling in badRecordsPath only occurs if Spark can read the file but finds a few faulty rows; when the file cannot be opened or parsed at all, the whole file is marked as "bad."

Typical Root Causes

Schema Mismatch: The file’s structure doesn’t match the schema (e.g., wrong delimiter, extra/missing columns).
File Corruption: The file is truncated or not a valid CSV/JSON/Parquet file.
Encoding Errors: The file’s encoding doesn’t match what Spark expects (e.g., UTF-8).
Header/Footer Issues: If a file has an unexpected header, footer, or partial content.

So please validate your data file for which you are facing issue and check if you see any of the issue specified above

shan-databricks · ‎07-23-2025

I have already analysed the issue and, yes, the schema doesn't match one of the rows, and it moved the complete file into badRecords and I have seen the behavior and that's fine, and thanks for the response.

Databricks Community

Databricks Autoloader BadRecords path Issue

Why Entire Files Go to `badRecordsPath`

Typical Root Causes

Why Entire Files Go to `badRecordsPath`

Typical Root Causes

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Portland Data + AI Meetup — Holiday Event - Wednesday, December 3rd

Databricks Community

Databricks Autoloader BadRecords path Issue

Why Entire Files Go to badRecordsPath

Typical Root Causes

Why Entire Files Go to badRecordsPath

Typical Root Causes

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Portland Data + AI Meetup — Holiday Event - Wednesday, December 3rd

Why Entire Files Go to `badRecordsPath`

Why Entire Files Go to `badRecordsPath`