Databricks Community

VaDim · ‎09-08-2022

I can't seem to make it work as I keep getting:

DeltaInvariantViolationException: NOT NULL constraint violated for column: dl_id.

-werners- · ‎09-08-2022

I was just typing this workaround 🙂 Because the id col is defined as GENERATED ALWAYS AS IDENTITY it can never be updated. If you then use UpdateAll it will generate an error.

remove the Id col from the columns to be updated.

This is the way.

View solution in original post

-werners- · ‎09-08-2022

I would think it is supported as that is the whole purpose of the ID column, not providing the value yourself.

I haven't tested it though.

But what version of databricks do you use? You need a pretty recent version (10.4+).

Also: how is the identity column created?

VaDim · ‎09-08-2022

I use 11.2 runtime. identity column created as:

dl_id   BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY

and the pyspark code is an upsert:

t.alias("ec").merge(df.alias("uc"), "ec.dl_id = uc.dl_id") \
            .whenMatchedUpdateAll() \
            .whenNotMatchedInsertAll() \
            .execute()

-werners- · ‎09-08-2022

Hm, and both ec and uc have id's always filled in?

VaDim · ‎09-08-2022

`ec` - always. `uc` - not. because `uc` might contain new data that needs appended.

-werners- · ‎09-08-2022

I think that is the issue. uc contains a column which is supposed to be an id, but you cannot pass it to the merge as it is generated.

What happens if you merge on the natural key?

VaDim · ‎09-08-2022

I haven't tried but I suspect it will fail with the same message on INSERT because uc.dl_id is NULL for some rows and `whenNotMatchedInsertAll` will attempt to insert a value for dl_id field instead of generating one (as if it has been user provided).

In the meantime I found a workaround: explicitly set the column mapping and do not include one for `dl_id`.

cols2update is a dict with column mapping except dl_id, it works.

t.alias("ec").merge(df.alias("uc"), "ec.dl_id = uc.dl_id") \
            .whenMatchedUpdate(set=cols2update) \
            .whenNotMatchedInsert(values=cols2update) \
            .execute()

-werners- · ‎09-08-2022

I was just typing this workaround 🙂 Because the id col is defined as GENERATED ALWAYS AS IDENTITY it can never be updated. If you then use UpdateAll it will generate an error.

remove the Id col from the columns to be updated.

This is the way.

byrdman · ‎09-08-2022

if you are using 'delta.columnMapping.mode' = 'name' on your table i could not get it to work, without that line .. for the not matched .. WHEN NOT MATCHED

THEN INSERT (columnname,columnName2) values(columnname,columnName2)

WHEN MATCHED

Then UPDATE SET (target.columnname = source.columnname,target.columnname2 = source.columnname2) That worked for me.. Leave the id column out of any of these

Databricks Community

Are MERGE INTO inserts supported when the delta table has an identity column ?

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

How to present and share your Notebook insights in AI/BI Dashboards

Introducing an exclusively Databricks-hosted Assistant

Meet the Databricks MVPs

Insights from a global survey of 1,100 technologists and interviews with 28 CIOs