Data validation with df writes using append mode
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesterday - last edited yesterday
Hi Team,
Recently i came across a situation where I had to write a huge data and it took 6 hrs to complete...later when i checked the target data , I saw 20% of the total records written incorrectly or corrupted because the source data itself was corrupted.
Now because of this I had to rewrite twb entire data again in two steps.
Create a dataframe and Filter the valid rows and weite the data .
This now took more time than the above(8 hrs).I just wanted to know if there is something which could validate the records (based on a condition) and through an error if data is incorrect before writing the data in append mode., so that we can sava a lot of time (12 hrs in this case)and compute as well .
The is an option called replaceWhere which does the same but is not applicable for append?
Any idea on how we can get through this?