Append to a record when updating Delta Table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-21-2022 03:40 AM
I am updating the delta table in Databricks as follows
segments_data.alias('segments_old').merge(
segments_data_new.alias("updates"),
"segments_old.source_url = updates.source_url",
).whenMatchedUpdate(
set={"segments": "segments_old.segments" + "updates.segments" } # <- the line is pseudocode, don't know correct API
).whenNotMatchedInsertAll().execute()where "segments" is a list of StructType
StructField(
"segments",
StructType(
[ ....How can I append to the existing list - namely append to "segments" column in segments_old from updates within "whenMatchedUpdate"?
Instead of replacing - say "segments": "updates.segments" I would like to append "segments": "segments_old.segments" + "updates.segments"
Thanks for suggestions in advance!
- Labels:
-
Delta Lake Upsert
-
Structfield
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-21-2022 12:21 PM
Set is just the equivalent of SQL UPDATE SET, so it sets a value from one table to a value from the expression.
So you can use built-in functions like col or exp:
from pyspark.sql.functions import *
...
set = { 'gender': col('segments_old.segments') + col('updates.segments') }
OR
set = { 'gender': exp('segments_old.segments + updates.segments') }
My blog: https://databrickster.medium.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-02-2022 11:24 PM
Hi @Sergii Ivakhno
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-01-2022 09:58 AM
Yes it worked for us - thanks 🙂