- 10546 Views
- 1 replies
- 0 kudos
Hi,
there is a function to delete data from a Delta Table:
deltaTable = DeltaTable.forPath(spark, "/data/events/")
deltaTable.delete(col("date") < "2017-01-01")
But is there also a way to drop duplicates somehow? Like deltaTable.dropDuplicates()......
- 10546 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Christian Hofstetter, You can check here for info on the same,https://docs.delta.io/0.4.0/delta-update.html#data-deduplication-when-writing-into-delta-tables
- 7882 Views
- 1 replies
- 1 kudos
Hello All,
We have a data in a column in pyspark dataframe having array of struct typehaving multiple nested fields present.if the value is not blank it will savethe data in the same array of struct type in spark delta table.please advise on the bel...
- 7882 Views
- 1 replies
- 1 kudos
Latest Reply
Hi @Anbazhagan anbutech17,Can you please try as in below answers,https://stackoverflow.com/questions/56942683/how-to-add-null-columns-to-complex-array-struct-in-spark-with-a-udf
- 6257 Views
- 2 replies
- 0 kudos
Hi,
We got the following error when we tried to UPDATE a delta table running concurrent notebooks that all end with an update to the same table.
"
com.databricks.sql.transaction.tahoe.ConcurrentAppendException: Files were added matching 'true' by a ...
- 6257 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @matt@direction.consulting
I just found the following doc https://docs.azuredatabricks.net/delta/isolation-level.html#set-the-isolation-level.
In my case, I could fixed partitioning the table and I think is the only way for concurrent update in t...
1 More Replies