I'm trying to load data using DLT and SCD 1 and am running into the error message "Detected a data update in the source table at version x. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to 'true'.
I have one DLT named charter and another named branch. The branch table has a relationship to charter, so I join to it in the pipeline to lookup the charter_key. The initial load of the data runs just fine, but when I run the incremental portion the next day, that is when I get the error message. This data is pretty static, so no data actually changed between the initial load and the incremental load. However, in the charter DLT table that is created, __apply_changes_storage_charter the __UpsertVersion has the most recent file name and an updated __Timestamp. Is that where it is saying that it has detected a change?
My code from the DLT pipeline is below. The _bronze tables are just taking the raw parquet file and putting it into a temporary streaming live table, so I am not including that code.
CREATE OR REFRESH STREAMING LIVE TABLE charter
(charter_key bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1)
,charter_number int
,charter_name string
,charter_address1 string
,charter_address2 string
,charter_zip_code string)
APPLY CHANGES INTO LIVE.charter
FROM STREAM(LIVE.charter_bronze)
KEYS (charter_number)
SEQUENCE BY file_name
COLUMNS * EXCEPT (file_name)
CREATE OR REFRESH TEMPORARY STREAMING LIVE TABLE branch_stage AS
SELECT c.charter_key
,b.branch_number
,b.branch_name
,b.branch_address1
,b.branch_address2
,b.branch_zip_code
,b.file_name
FROM STREAM(LIVE.charter) c
INNER JOIN STREAM(LIVE.branch_bronze) b
ON c.charter_number = b.charter_number
CREATE OR REFRESH STREAMING LIVE TABLE branch
(branch_key bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1)
,charter_key bigint
,branch_number int
,branch_name string
,branch_address1 string
,branch_address2 string
,branch_zip_code string)
APPLY CHANGES INTO LIVE.branch
FROM STREAM(LIVE.branch_stage)
KEYS (charter_key, branch_number)
SEQUENCE BY file_name
COLUMNS * EXCEPT (file_name)
I've read that it treats the files as append only, but then how exactly does SCD work? In the pipeline I have both of these set to true:
pipelines.applyChangesPreviewEnabled
spark.databricks.delta.schema.autoMerge.enabled
Any help is appreciated.