AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hello,

I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message:

"AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'"

Here is part of the code:

@dlt.table(

name="streaming_table",

comment="This table is used to test the drop duplicates with watermark"

)

def streaming_table_fct():

stream_df = spark.readStream.table("schema.table") \

.filter(f.col("kind") == "abc") \

.withWatermark("meta_created", "24 hours")

stream_df.dropDuplicatesWithinWatermark(["id"])

return stream_df

0 REPLIES 0

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.