I work with Spark-Scala and I receive Data in different formats ( .csv/.xlxs/.txt etc ), when I try to read/write this data from different sources to a any database, many records got rejected due to various issues like (special characters, data type difference between source and target table etc. In such cases, my entire load gets failed.
what I want is a way to capture the rejected rows into separate file and continue to load remaining correct records in database table.
basically not to stop the flow of the program due to some rows, and catch these problem causing rows.
example -
I read a .csv with 98 perfect rows and 2 corrupt rows, I want to read/write 98 rows into the database and send 2 corrupt rows to the user as a file.
P.S. I am receiving data from user so i can't define a schema, i need a dynamic way to read the file and filter out the corrupt data in a file.