There are multiple tables in the config/metadata table. These tables need to be
validated for DQ rules.
1.Natural Key / Business Key /Primary Key cannot be null or
blank.
2.Natural Key/Primary Key cannot be duplicate.
3.Join columns missing values
4.Business specific rule
How do we validate the above rules dynamically for the
tables configured in the metadata driven table.
Please suggest using Pyspark
How the data validation rule engine need to be build using pyspark.
Example
There are
2 tables Employee & Address.
So Some generic function will be written which will
verify the EmployeeId not null/blank and for Address table Addressline1 cannot be
null/blank