cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SQL merge to update one of the nested column

SagarJi
New Contributor II

 

I am having existing delta-lake as target, and the small set of records at hand as CURRENT_BATCH,
I have a requirement to update dateTimeUpdated column inside parent2, using following merge query.
========

MERGE INTO mydataset AS target
USING CURRENT_BATCH AS incoming
ON target.parent1.companyId=incoming.parent1.companyId AND target.parent1.id=incoming.parent1.id AND incoming.is_deleted <> 'N'
WHEN MATCHED AND incoming.parent2.dateTimeUpdated > target.parent2.dateTimeUpdated
THEN UPDATE SET
target.is_deleted = incoming.is_deleted,
target.parent2.dateTimeUpdated = incoming.parent2.dateTimeUpdated
WHEN NOT MATCHED THEN INSERT *
======

However this query fails giving me following error

[DELTA_MERGE_UNEXPECTED_ASSIGNMENT_KEY]
Unexpected assignment key: class org.apache.spark.sql.catalyst.plans.logical.Assignment -
assignment(parent2#6623.dateTimeUpdated, parent2#6664.dateTimeUpdated)

It works fine if I update directly parent i.e target.parent2 = incoming.parent2, but I just need to update one of the nested columns, how do I do that?

 

2 REPLIES 2

SagarJi
New Contributor II

I am using Apache Spark 3.5.1 and deltalake version 3.1.0

filipniziol
Contributor III

Hi @SagarJi ,

According to the documentation updates to the nested columns are not supported:

filipniziol_0-1728023452259.png

What you can do you can construct the whole struct and update the parent:

MERGE INTO mydataset AS target
USING CURRENT_BATCH AS incoming
ON target.parent1.companyId = incoming.parent1.companyId 
   AND target.parent1.id = incoming.parent1.id 
   AND incoming.is_deleted <> 'N'
WHEN MATCHED AND incoming.parent2.dateTimeUpdated > target.parent2.dateTimeUpdated
THEN UPDATE SET
    target.is_deleted = incoming.is_deleted,
    target.parent2 = struct(
        target.parent2.field1 AS field1, -- Replace 'field1' with actual field names
        target.parent2.field2 AS field2, -- Replace 'field2' with actual field names
        -- Include all other fields from parent2 here, keeping their values unchanged
        incoming.parent2.dateTimeUpdated AS dateTimeUpdated -- Update this field
    )
WHEN NOT MATCHED THEN INSERT *



Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group