Data validation with df writes using append mode

RevanthV
New Contributor III

Hi Team,

Recently i came across a situation where I had to write a huge data and it took 6 hrs to complete...later when i checked the target data , I saw 20% of the total records written incorrectly or corrupted because the source data itself was corrupted.

Now because of this I had to rewrite twb entire data again in two steps.

Create a dataframe and Filter the valid rows and weite the data .

This now took more time than the above(8 hrs).I just wanted to know if there is something which could validate the records (based on a condition) and through an error if data is incorrect before writing the data in append mode., so that we can sava a lot of time (12 hrs in this case)and compute as well .

The is an option called replaceWhere which does the same but is not applicable for append? 

Any idea on how we can get through this?