Databricks Community

VaDim · ‎09-08-2022

I can't seem to make it work as I keep getting:

DeltaInvariantViolationException: NOT NULL constraint violated for column: dl_id.

-werners- · ‎09-08-2022

I was just typing this workaround 🙂 Because the id col is defined as GENERATED ALWAYS AS IDENTITY it can never be updated. If you then use UpdateAll it will generate an error.

remove the Id col from the columns to be updated.

This is the way.

View solution in original post

-werners- · ‎09-08-2022

I would think it is supported as that is the whole purpose of the ID column, not providing the value yourself.

I haven't tested it though.

But what version of databricks do you use? You need a pretty recent version (10.4+).

Also: how is the identity column created?

VaDim · ‎09-08-2022

I use 11.2 runtime. identity column created as:

dl_id   BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY

and the pyspark code is an upsert:

t.alias("ec").merge(df.alias("uc"), "ec.dl_id = uc.dl_id") \
            .whenMatchedUpdateAll() \
            .whenNotMatchedInsertAll() \
            .execute()

-werners- · ‎09-08-2022

Hm, and both ec and uc have id's always filled in?

VaDim · ‎09-08-2022

`ec` - always. `uc` - not. because `uc` might contain new data that needs appended.

-werners- · ‎09-08-2022

I think that is the issue. uc contains a column which is supposed to be an id, but you cannot pass it to the merge as it is generated.

What happens if you merge on the natural key?

VaDim · ‎09-08-2022

I haven't tried but I suspect it will fail with the same message on INSERT because uc.dl_id is NULL for some rows and `whenNotMatchedInsertAll` will attempt to insert a value for dl_id field instead of generating one (as if it has been user provided).

In the meantime I found a workaround: explicitly set the column mapping and do not include one for `dl_id`.

cols2update is a dict with column mapping except dl_id, it works.

t.alias("ec").merge(df.alias("uc"), "ec.dl_id = uc.dl_id") \
            .whenMatchedUpdate(set=cols2update) \
            .whenNotMatchedInsert(values=cols2update) \
            .execute()

-werners- · ‎09-08-2022

I was just typing this workaround 🙂 Because the id col is defined as GENERATED ALWAYS AS IDENTITY it can never be updated. If you then use UpdateAll it will generate an error.

remove the Id col from the columns to be updated.

This is the way.

byrdman · ‎09-08-2022

if you are using 'delta.columnMapping.mode' = 'name' on your table i could not get it to work, without that line .. for the not matched .. WHEN NOT MATCHED

THEN INSERT (columnname,columnName2) values(columnname,columnName2)

WHEN MATCHED

Then UPDATE SET (target.columnname = source.columnname,target.columnname2 = source.columnname2) That worked for me.. Leave the id column out of any of these

Databricks Community

Are MERGE INTO inserts supported when the delta table has an identity column ?

Congratulations Databricks Partners! You're Now Officially Recognized in the Databricks Community

Solution Accelerator Series | Measure Ad Effectiveness With Multi-Touch Attribution

Govern AI Spend at Scale: A Data-Driven Approach to AI Governance | Webinar

Databricks AMER Learning Festival | Virtual Training

Introducing the Genie Hub: Ask Questions, Share Builds, and Master Conversational Analytics