cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT apply_changes applies only deletes and inserts not updates

Anske
New Contributor III

Hi,

I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:

dlt.apply_changes(
    target = "cdctest",
    source = "cdctest_cdc_enriched",
    keys = ["ID"],
    sequence_by = col("tran_begin_time"),
    apply_as_deletes = expr("operation = 1"),
    except_column_list = ["operation","tran_begin_time"],
    stored_as_scd_type = 1
)
Now if I add an 'update' row to the source table and update the pipeline, the update is not applied to the target table, even though it is added to the source table. The source table follows the structure of a cdc table in SQL Server. 
A screenshot of the data that is in the source table is attached, as well as a screenshot of the target table not showing updated data (don't be fooled by some of the - updated values in test1 column, those are the result of a backfill). 
 
Is this is a known issue? Has anyone else had trouble with applying update operations sql server style (where update creates 2 records, one with the data before update (operation 3). and one with the data after update (operation 4)? Or am I doing something completely wrong?
1 ACCEPTED SOLUTION

Accepted Solutions

Anske
New Contributor III

Hi Kaniz, 

The DLT pipeline runs without errors, and all changes of type 1 and 2 (inserts and deletes) are actually applied correctly. But, after checking the target table, apparently, the updates were not reflected in the target. I have since created a workaround by deleting the rows in the source where operation = 3 (update rows with values before update) and replacing all instances of 4 with the string 'UPDATE' in the operation column (I had to change the data type of the operation column to a string and set apply_as_deletes parameter to '1' instead of 1. This fixed it and it now processes inserts, deletes and all updates. 

 

View solution in original post

6 REPLIES 6

Kaniz
Community Manager
Community Manager

Hi @Anske, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where updates from the source table are not being correctly applied to the target table.

Let’s troubleshoot this together!

  1. Pipeline Update Process: When you run a pipeline update in DLT, it performs the following steps:

  2. Update Types: The behaviour of pipeline updates depends on the update type:

    • Refresh all: Updates all live tables to reflect the current state of their input data sources. For streaming tables, new rows are appended.
    • Full refresh all: Updates all live tables by attempting to clear all data from each table and then loading all data from the streaming source.
    • Refresh selection: Similar to “refresh all,” but allows you to refresh only selected tables.
    • Full refresh selection: Similar to “full refresh all,” but for selected tables1.
  3. Troubleshooting Steps:

  4. Validation:

Please review these steps, and if the issue persists, provide additional details about your pipeline configuration and any error messages you encounter. We’ll continue troubleshooting from there! 😊

 

Anske
New Contributor III

Hi Kaniz, 

The DLT pipeline runs without errors, and all changes of type 1 and 2 (inserts and deletes) are actually applied correctly. But, after checking the target table, apparently, the updates were not reflected in the target. I have since created a workaround by deleting the rows in the source where operation = 3 (update rows with values before update) and replacing all instances of 4 with the string 'UPDATE' in the operation column (I had to change the data type of the operation column to a string and set apply_as_deletes parameter to '1' instead of 1. This fixed it and it now processes inserts, deletes and all updates. 

 

Kaniz
Community Manager
Community Manager

Hi @Anske, That's great to hear that you were able to find a workaround for the issue you were facing with the DLT pipeline. Thank you for sharing the details of the workaround with us. It's always helpful to hear about the solutions that our users come up with. If you have any other questions or need further assistance, please feel free to reach out.

Anske
New Contributor III

Hi @Kaniz ,

Thanks for that, I actually would love some assistance. When I was at the databricks intelligence day in early April, I asked the guy giving the workshop about mirroring some tables from a sql server application database to the delta lake with databricks. He told me that DB will release a feature offering exactly this functionality in Q2 (he thought it would be May), and he advised me to reach out to our account contact for more info. I have tried reaching out to Claire Nicholl who is supposed to be our responsible account executive (I got redirected to her by Anna Cumbelich), by emailing her. I did this the 18th of April but I am still waiting for a reply. Could you tell me anything about this new feature and/or redirect me to the right person?

Another question that I have is, would it be possible to get some actual support on databricks? I found the page listing the support plans (https://www.databricks.com/support), but there is no info on the page about costs, or how to subscribe to any of the plans. 

Kaniz
Community Manager
Community Manager

Hi @Anske,

  1. Mirroring Tables from SQL Server to Delta Lake with Databricks:
  1. Databricks Support Plans:
    • Databricks offers several support plans to cater to different needs. Let’s take a look at the available plans:
      • Business: Enhanced support for Databricks on permitted Cloud Service Providers and Databricks-powered services.
      • Enhanced: Production-level support with additional benefits, including proactive monitoring and escalation management.
      • Production: Mission-critical support for scenarios where a production system is down or severely impacted.
      • Single-Cloud Support: Support for platform services on a single-chosen Cloud Service Provider.
    • Each plan provides different levels of service, including response times, support hours, and technical contacts. You can find more details on the Databricks Support page2.
    • If you’re interested in subscribing to a support plan, I recommend sending me your contact details at community@databricks.com for pricing information and subscription details.
 

Anske
New Contributor III

Hi @Kaniz ,

mail for pricing info on support plans is sent.

With regard to the preview release of Delta Lake 3.0 Delta Universal Format (UniForm), I have read the release notes but fail to see how this helps in any way towards mirroring data from a sql server instance, could you please explain?