Topics with Label: Delta table

Forum Posts

Sorted by:

by ChristianHofste • New Contributor II

05-07-2020 4:21:37 AM

10546 Views
1 replies
0 kudos

Drop duplicates in Table

Hi, there is a function to delete data from a Delta Table: deltaTable = DeltaTable.forPath(spark, "/data/events/") deltaTable.delete(col("date") < "2017-01-01") But is there also a way to drop duplicates somehow? Like deltaTable.dropDuplicates()......

Data Engineering

10546 Views
1 replies
0 kudos

05-07-2020 4:21:37 AM

View Replies

Latest Reply

shyam_9
Valued Contributor

05-19-2020 2:55:34 PM

0 kudos

Hi @Christian Hofstetter, You can check here for info on the same,https://docs.delta.io/0.4.0/delta-update.html#data-deduplication-when-writing-into-delta-tables

0 kudos

05-19-2020 2:55:34 PM

by Anbazhagananbut • New Contributor II

04-07-2020 11:14:28 PM

7882 Views
1 replies
1 kudos

How to handle Blank values in Array of struct elements in pyspark

Hello All, We have a data in a column in pyspark dataframe having array of struct typehaving multiple nested fields present.if the value is not blank it will savethe data in the same array of struct type in spark delta table.please advise on the bel...

Data Engineering

7882 Views
1 replies
1 kudos

04-07-2020 11:14:28 PM

View Replies

Latest Reply

shyam_9
Valued Contributor

04-15-2020 12:05:23 PM

1 kudos

Hi @Anbazhagan anbutech17,Can you please try as in below answers,https://stackoverflow.com/questions/56942683/how-to-add-null-columns-to-complex-array-struct-in-spark-with-a-udf

1 kudos

04-15-2020 12:05:23 PM

by GuidoPereyra_ • New Contributor II

10-30-2018 10:35:01 AM

6257 Views
2 replies
0 kudos

Databricks Delta - UPDATE error

Hi, We got the following error when we tried to UPDATE a delta table running concurrent notebooks that all end with an update to the same table. " com.databricks.sql.transaction.tahoe.ConcurrentAppendException: Files were added matching 'true' by a ...

Data Engineering

6257 Views
2 replies
0 kudos

10-30-2018 10:35:01 AM

View Replies

Latest Reply

GuidoPereyra_
New Contributor II

06-21-2019 7:10:13 AM

0 kudos

Hi @matt@direction.consulting I just found the following doc https://docs.azuredatabricks.net/delta/isolation-level.html#set-the-isolation-level. In my case, I could fixed partitioning the table and I think is the only way for concurrent update in t...

0 kudos

06-21-2019 7:10:13 AM

1 More Replies