Databricks Community

Anske · ‎04-25-2024

Hi,

I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:

dlt.apply_changes(

target = "cdctest",

source = "cdctest_cdc_enriched",

keys = ["ID"],

sequence_by = col("tran_begin_time"),

apply_as_deletes = expr("operation = 1"),

except_column_list = ["operation","tran_begin_time"],

stored_as_scd_type = 1

)

Now if I add an 'update' row to the source table and update the pipeline, the update is not applied to the target table, even though it is added to the source table. The source table follows the structure of a cdc table in SQL Server.

A screenshot of the data that is in the source table is attached, as well as a screenshot of the target table not showing updated data (don't be fooled by some of the - updated values in test1 column, those are the result of a backfill).

Is this is a known issue? Has anyone else had trouble with applying update operations sql server style (where update creates 2 records, one with the data before update (operation 3). and one with the data after update (operation 4)? Or am I doing something completely wrong?

Anske · ‎05-06-2024

Hi Kaniz,

The DLT pipeline runs without errors, and all changes of type 1 and 2 (inserts and deletes) are actually applied correctly. But, after checking the target table, apparently, the updates were not reflected in the target. I have since created a workaround by deleting the rows in the source where operation = 3 (update rows with values before update) and replacing all instances of 4 with the string 'UPDATE' in the operation column (I had to change the data type of the operation column to a string and set apply_as_deletes parameter to '1' instead of 1. This fixed it and it now processes inserts, deletes and all updates.

View solution in original post

Anske · ‎05-06-2024

Hi Kaniz,

The DLT pipeline runs without errors, and all changes of type 1 and 2 (inserts and deletes) are actually applied correctly. But, after checking the target table, apparently, the updates were not reflected in the target. I have since created a workaround by deleting the rows in the source where operation = 3 (update rows with values before update) and replacing all instances of 4 with the string 'UPDATE' in the operation column (I had to change the data type of the operation column to a string and set apply_as_deletes parameter to '1' instead of 1. This fixed it and it now processes inserts, deletes and all updates.

Anske · ‎05-06-2024

Hi @Retired_mod ,

Thanks for that, I actually would love some assistance. When I was at the databricks intelligence day in early April, I asked the guy giving the workshop about mirroring some tables from a sql server application database to the delta lake with databricks. He told me that DB will release a feature offering exactly this functionality in Q2 (he thought it would be May), and he advised me to reach out to our account contact for more info. I have tried reaching out to Claire Nicholl who is supposed to be our responsible account executive (I got redirected to her by Anna Cumbelich), by emailing her. I did this the 18th of April but I am still waiting for a reply. Could you tell me anything about this new feature and/or redirect me to the right person?

Another question that I have is, would it be possible to get some actual support on databricks? I found the page listing the support plans (https://www.databricks.com/support), but there is no info on the page about costs, or how to subscribe to any of the plans.

Anske · ‎05-06-2024

Hi @Retired_mod ,

mail for pricing info on support plans is sent.

With regard to the preview release of Delta Lake 3.0 Delta Universal Format (UniForm), I have read the release notes but fail to see how this helps in any way towards mirroring data from a sql server instance, could you please explain?