Databricks

VaDim · ‎09-08-2022

I can't seem to make it work as I keep getting:

DeltaInvariantViolationException: NOT NULL constraint violated for column: dl_id.

-werners- · ‎09-08-2022

I was just typing this workaround 🙂 Because the id col is defined as GENERATED ALWAYS AS IDENTITY it can never be updated. If you then use UpdateAll it will generate an error.

remove the Id col from the columns to be updated.

This is the way.

View solution in original post

-werners- · ‎09-08-2022

I would think it is supported as that is the whole purpose of the ID column, not providing the value yourself.

I haven't tested it though.

But what version of databricks do you use? You need a pretty recent version (10.4+).

Also: how is the identity column created?

VaDim · ‎09-08-2022

I use 11.2 runtime. identity column created as:

dl_id   BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY

and the pyspark code is an upsert:

t.alias("ec").merge(df.alias("uc"), "ec.dl_id = uc.dl_id") \
            .whenMatchedUpdateAll() \
            .whenNotMatchedInsertAll() \
            .execute()

-werners- · ‎09-08-2022

Hm, and both ec and uc have id's always filled in?

VaDim · ‎09-08-2022

`ec` - always. `uc` - not. because `uc` might contain new data that needs appended.

-werners- · ‎09-08-2022

I think that is the issue. uc contains a column which is supposed to be an id, but you cannot pass it to the merge as it is generated.

What happens if you merge on the natural key?

VaDim · ‎09-08-2022

I haven't tried but I suspect it will fail with the same message on INSERT because uc.dl_id is NULL for some rows and `whenNotMatchedInsertAll` will attempt to insert a value for dl_id field instead of generating one (as if it has been user provided).

In the meantime I found a workaround: explicitly set the column mapping and do not include one for `dl_id`.

cols2update is a dict with column mapping except dl_id, it works.

t.alias("ec").merge(df.alias("uc"), "ec.dl_id = uc.dl_id") \
            .whenMatchedUpdate(set=cols2update) \
            .whenNotMatchedInsert(values=cols2update) \
            .execute()

-werners- · ‎09-08-2022

I was just typing this workaround 🙂 Because the id col is defined as GENERATED ALWAYS AS IDENTITY it can never be updated. If you then use UpdateAll it will generate an error.

remove the Id col from the columns to be updated.

This is the way.

byrdman · ‎09-08-2022

if you are using 'delta.columnMapping.mode' = 'name' on your table i could not get it to work, without that line .. for the not matched .. WHEN NOT MATCHED

THEN INSERT (columnname,columnName2) values(columnname,columnName2)

WHEN MATCHED

Then UPDATE SET (target.columnname = source.columnname,target.columnname2 = source.columnname2) That worked for me.. Leave the id column out of any of these

Databricks

Are MERGE INTO inserts supported when the delta table has an identity column ?

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI