AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'

johanjohan
New Contributor

Hello,

I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message:

"AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'"

 

Here is part of the code:

 
@dlt.table(
name="streaming_table",
comment="This table is used to test the drop duplicates with watermark"
)

def streaming_table_fct():
stream_df = spark.readStream.table("schema.table") \
.filter(f.col("kind") == "abc") \
.withWatermark("meta_created", "24 hours")
 
stream_df.dropDuplicatesWithinWatermark(["id"])

return stream_df