AttributeError: 'DataFrame' object has no attribut...

johanjohan · ‎02-19-2024

Hello,

I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message:

"AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'"

Here is part of the code:

@dlt.table(

name="streaming_table",

comment="This table is used to test the drop duplicates with watermark"

)

def streaming_table_fct():

stream_df = spark.readStream.table("schema.table") \

.filter(f.col("kind") == "abc") \

.withWatermark("meta_created", "24 hours")

stream_df.dropDuplicatesWithinWatermark(["id"])

return stream_df