Hello,
I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message:
"AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'"
Here is part of the code:
@dlt.table(
name="streaming_table",
comment="This table is used to test the drop duplicates with watermark"
)
def streaming_table_fct():
stream_df = spark.readStream.table("schema.table") \
.filter(f.col("kind") == "abc") \
.withWatermark("meta_created", "24 hours")
stream_df.dropDuplicatesWithinWatermark(["id"])
return stream_df