hi,
could someone please help me with this code :-
input parameter df is a spark structured streaming dataframe
def apply_duplicacy_check(df, duplicate_check_columns):
if len(duplicate_check_columns) == 0:
return None, df
valid_df = df.dropDuplicates(duplicate_check_columns)
error_df = df.exceptAll(valid_df)
return error_df,valid_df
I am getting this error :-
Except on a streaming DataFrame/Dataset on the right is not supported;
Except All true
:- Project [page#54781.Name AS division_name#54786, page#54781.ShortName AS short_name#54787, page#54781.ExternalSystemCode AS external_system_code#54788, page#54781.AccountingCode AS division_number#54789, page#54781.ParentDivisionId AS parent_division_id#54790, page#54781.TimeZone AS timezone#54791, page#54781.DivisionType.Id AS division_type_id#54792, page#54781.DivisionType.Name AS division_type_name#54793, sourceExtractDatetime#54773 AS source_extract_datetime#54794, page#54781.Id AS division_id#54795]
: +- Project [Data#54772, sourceExtractDatetime#54773, page#54781]
: +- Generate explode(Data#54772.Page), true, [page#54781]