Databricks Community

ChristianHofste · ‎05-07-2020

Hi,

there is a function to delete data from a Delta Table:

deltaTable = DeltaTable.forPath(spark, "/data/events/")
deltaTable.delete(col("date") < "2017-01-01")

But is there also a way to drop duplicates somehow? Like deltaTable.dropDuplicates()...

I don't want to read the whole table as dataframe, drop the duplicates, and write it to storage again

shyam_9 · ‎05-19-2020

Hi @Christian Hofstetter,

You can check here for info on the same,

Drop duplicates in Table