โ09-08-2022 02:33 AM
I can't seem to make it work as I keep getting:
DeltaInvariantViolationException: NOT NULL constraint violated for column: dl_id.
โ09-08-2022 03:43 AM
I was just typing this workaround ๐ Because the id col is defined as GENERATED ALWAYS AS IDENTITY it can never be updated. If you then use UpdateAll it will generate an error.
remove the Id col from the columns to be updated.
This is the way.
โ09-08-2022 02:49 AM
I would think it is supported as that is the whole purpose of the ID column, not providing the value yourself.
I haven't tested it though.
But what version of databricks do you use? You need a pretty recent version (10.4+).
Also: how is the identity column created?
โ09-08-2022 03:08 AM
I use 11.2 runtime. identity column created as:
dl_id BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY
and the pyspark code is an upsert:
t.alias("ec").merge(df.alias("uc"), "ec.dl_id = uc.dl_id") \
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()
โ09-08-2022 03:21 AM
Hm, and both ec and uc have id's always filled in?
โ09-08-2022 03:25 AM
`ec` - always. `uc` - not. because `uc` might contain new data that needs appended.
โ09-08-2022 03:29 AM
I think that is the issue. uc contains a column which is supposed to be an id, but you cannot pass it to the merge as it is generated.
What happens if you merge on the natural key?
โ09-08-2022 03:41 AM
I haven't tried but I suspect it will fail with the same message on INSERT because uc.dl_id is NULL for some rows and `whenNotMatchedInsertAll` will attempt to insert a value for dl_id field instead of generating one (as if it has been user provided).
In the meantime I found a workaround: explicitly set the column mapping and do not include one for `dl_id`.
cols2update is a dict with column mapping except dl_id, it works.
t.alias("ec").merge(df.alias("uc"), "ec.dl_id = uc.dl_id") \
.whenMatchedUpdate(set=cols2update) \
.whenNotMatchedInsert(values=cols2update) \
.execute()
โ09-08-2022 03:43 AM
I was just typing this workaround ๐ Because the id col is defined as GENERATED ALWAYS AS IDENTITY it can never be updated. If you then use UpdateAll it will generate an error.
remove the Id col from the columns to be updated.
This is the way.
โ09-08-2022 06:35 PM
if you are using 'delta.columnMapping.mode' = 'name' on your table i could not get it to work, without that line .. for the not matched .. WHEN NOT MATCHED
THEN INSERT (columnname,columnName2) values(columnname,columnName2)
WHEN MATCHED
Then UPDATE SET (target.columnname = source.columnname,target.columnname2 = source.columnname2) That worked for me.. Leave the id column out of any of these
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group