cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Drop duplicates in Table

ChristianHofste
New Contributor II

Hi,

there is a function to delete data from a Delta Table:

deltaTable = DeltaTable.forPath(spark, "/data/events/")
deltaTable.delete(col("date") < "2017-01-01")

But is there also a way to drop duplicates somehow? Like deltaTable.dropDuplicates()...

I don't want to read the whole table as dataframe, drop the duplicates, and write it to storage again

1 REPLY 1

shyam_9
Valued Contributor
Valued Contributor

Hi @Christian Hofstetter,

You can check here for info on the same,

https://docs.delta.io/0.4.0/delta-update.html#data-deduplication-when-writing-into-delta-tables
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.