cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChristianHofste
by New Contributor II
  • 10546 Views
  • 1 replies
  • 0 kudos

Drop duplicates in Table

Hi, there is a function to delete data from a Delta Table: deltaTable = DeltaTable.forPath(spark, "/data/events/") deltaTable.delete(col("date") < "2017-01-01") But is there also a way to drop duplicates somehow? Like deltaTable.dropDuplicates()......

  • 10546 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Valued Contributor
  • 0 kudos

Hi @Christian Hofstetter, You can check here for info on the same,https://docs.delta.io/0.4.0/delta-update.html#data-deduplication-when-writing-into-delta-tables

  • 0 kudos
Anbazhagananbut
by New Contributor II
  • 7882 Views
  • 1 replies
  • 1 kudos

How to handle Blank values in Array of struct elements in pyspark

Hello All, We have a data in a column in pyspark dataframe having array of struct typehaving multiple nested fields present.if the value is not blank it will savethe data in the same array of struct type in spark delta table.please advise on the bel...

  • 7882 Views
  • 1 replies
  • 1 kudos
Latest Reply
shyam_9
Valued Contributor
  • 1 kudos

Hi @Anbazhagan anbutech17,Can you please try as in below answers,https://stackoverflow.com/questions/56942683/how-to-add-null-columns-to-complex-array-struct-in-spark-with-a-udf

  • 1 kudos
GuidoPereyra_
by New Contributor II
  • 6257 Views
  • 2 replies
  • 0 kudos

Databricks Delta - UPDATE error

Hi, We got the following error when we tried to UPDATE a delta table running concurrent notebooks that all end with an update to the same table. " com.databricks.sql.transaction.tahoe.ConcurrentAppendException: Files were added matching 'true' by a ...

  • 6257 Views
  • 2 replies
  • 0 kudos
Latest Reply
GuidoPereyra_
New Contributor II
  • 0 kudos

Hi @matt@direction.consulting I just found the following doc https://docs.azuredatabricks.net/delta/isolation-level.html#set-the-isolation-level. In my case, I could fixed partitioning the table and I think is the only way for concurrent update in t...

  • 0 kudos
1 More Replies
Labels