Hi All,
I have a daily spark job that reads and joins 3-4 source tables and writes the df in a parquet format. This data frame consists of 100+ columns. As this job run daily, our deduplication logic identifies the latest record from each of source tables , joins them and eventually overwrites the existing parquet file.
The question becomes - is there a way to implement the incremental write only in cases of a new record or changes in the values in the existing record of the file.