SQL merge to update one of the nested column

SagarJi
New Contributor II

 

I am having existing delta-lake as target, and the small set of records at hand as CURRENT_BATCH,
I have a requirement to update dateTimeUpdated column inside parent2, using following merge query.
========

MERGE INTO mydataset AS target
USING CURRENT_BATCH AS incoming
ON target.parent1.companyId=incoming.parent1.companyId AND target.parent1.id=incoming.parent1.id AND incoming.is_deleted <> 'N'
WHEN MATCHED AND incoming.parent2.dateTimeUpdated > target.parent2.dateTimeUpdated
THEN UPDATE SET
target.is_deleted = incoming.is_deleted,
target.parent2.dateTimeUpdated = incoming.parent2.dateTimeUpdated
WHEN NOT MATCHED THEN INSERT *
======

However this query fails giving me following error

[DELTA_MERGE_UNEXPECTED_ASSIGNMENT_KEY]
Unexpected assignment key: class org.apache.spark.sql.catalyst.plans.logical.Assignment -
assignment(parent2#6623.dateTimeUpdated, parent2#6664.dateTimeUpdated)

It works fine if I update directly parent i.e target.parent2 = incoming.parent2, but I just need to update one of the nested columns, how do I do that?

 

SagarJi
New Contributor II

I am using Apache Spark 3.5.1 and deltalake version 3.1.0

filipniziol
Esteemed Contributor

Hi @SagarJi ,

According to the documentation updates to the nested columns are not supported:

filipniziol_0-1728023452259.png

What you can do you can construct the whole struct and update the parent:

MERGE INTO mydataset AS target
USING CURRENT_BATCH AS incoming
ON target.parent1.companyId = incoming.parent1.companyId 
   AND target.parent1.id = incoming.parent1.id 
   AND incoming.is_deleted <> 'N'
WHEN MATCHED AND incoming.parent2.dateTimeUpdated > target.parent2.dateTimeUpdated
THEN UPDATE SET
    target.is_deleted = incoming.is_deleted,
    target.parent2 = struct(
        target.parent2.field1 AS field1, -- Replace 'field1' with actual field names
        target.parent2.field2 AS field2, -- Replace 'field2' with actual field names
        -- Include all other fields from parent2 here, keeping their values unchanged
        incoming.parent2.dateTimeUpdated AS dateTimeUpdated -- Update this field
    )
WHEN NOT MATCHED THEN INSERT *