SQL merge to update one of the nested column
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-03-2024 06:17 PM
I am having existing delta-lake as target, and the small set of records at hand as CURRENT_BATCH,
I have a requirement to update dateTimeUpdated column inside parent2, using following merge query.
========
MERGE INTO mydataset AS target
USING CURRENT_BATCH AS incoming
ON target.parent1.companyId=incoming.parent1.companyId AND target.parent1.id=incoming.parent1.id AND incoming.is_deleted <> 'N'
WHEN MATCHED AND incoming.parent2.dateTimeUpdated > target.parent2.dateTimeUpdated
THEN UPDATE SET
target.is_deleted = incoming.is_deleted,
target.parent2.dateTimeUpdated = incoming.parent2.dateTimeUpdated
WHEN NOT MATCHED THEN INSERT *
======
However this query fails giving me following error
[DELTA_MERGE_UNEXPECTED_ASSIGNMENT_KEY]
Unexpected assignment key: class org.apache.spark.sql.catalyst.plans.logical.Assignment -
assignment(parent2#6623.dateTimeUpdated, parent2#6664.dateTimeUpdated)
It works fine if I update directly parent i.e target.parent2 = incoming.parent2, but I just need to update one of the nested columns, how do I do that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-03-2024 07:04 PM
I am using Apache Spark 3.5.1 and deltalake version 3.1.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-03-2024 11:32 PM
Hi @SagarJi ,
According to the documentation updates to the nested columns are not supported:
What you can do you can construct the whole struct and update the parent:
MERGE INTO mydataset AS target
USING CURRENT_BATCH AS incoming
ON target.parent1.companyId = incoming.parent1.companyId
AND target.parent1.id = incoming.parent1.id
AND incoming.is_deleted <> 'N'
WHEN MATCHED AND incoming.parent2.dateTimeUpdated > target.parent2.dateTimeUpdated
THEN UPDATE SET
target.is_deleted = incoming.is_deleted,
target.parent2 = struct(
target.parent2.field1 AS field1, -- Replace 'field1' with actual field names
target.parent2.field2 AS field2, -- Replace 'field2' with actual field names
-- Include all other fields from parent2 here, keeping their values unchanged
incoming.parent2.dateTimeUpdated AS dateTimeUpdated -- Update this field
)
WHEN NOT MATCHED THEN INSERT *