Records are missing while creating new dataframe from one big dataframe using filter

Community Platform Discussions

Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.

Hi,

I have data in file like below

I have different types of row in my input file, column number 8 defines the type of the record.

In the above file we have 4 types of records 00 to 03

My requirement is:

There will be multiple files in the source path, each file having nearly 1 million records
Read the files and create different dataframes based on record type using filter on original dataframe(all record type dataframe)
Based on mapping file select the column positions and map it to column name
Create dictionary of dataframes with record type is key and dataframe is the value

My code looks like below

Issue is for some records are missing from result dataframes.

Example:

for id: 1836, record type:01 there should be 15 records but we get only 14. If you re run again, we will get same issue in another file for another id.
in the original dataframe: Total 18 rows are there for id:1836, out of 18, 15 are related to record type 01.