Databricks Community

Gareema · ‎07-07-2024

Hello Team

I am using DLT. I am able to see the lineage when doing normal process. However as soon as I use 'APPLY_Changes' feature after the lineag ebreaks and I am no more able to see the Data lineage from the catalog after going to table.

Is there any way that I can use Apply_Changes and have the lineage retained?

Code to load in silver table:

Gareema · ‎07-07-2024

One point here is, both the tables are in different schema. Hence different DLT pipelines and notebooks are there.

Kaniz_Fatma · ‎07-08-2024

Hi @Gareema, Hello! I understand your concern about the data lineage breaking when using the 'APPLY_Changes' feature in DLT (Delta Live Tables). This is a common issue that users face, but there are a few ways to address it and maintain the lineage even after applying changes.

Approach 1: Use the MERGE Statement

Instead of using the 'APPLY_Changes' feature, you can use the MERGE statement to update the target table. The MERGE statement allows you to update, insert, or delete rows in the target table based on the changes in the source table while preserving the data lineage.

Here's an example of how you can use the MERGE statement to load data into a silver table:

from delta.tables import DeltaTable

# Read the source data
source_df = spark.read.format("delta").load("path/to/source/table")

# Get the target table
target_table = DeltaTable.forPath(spark, "path/to/silver/table")

# Merge the source data into the target table
target_table.alias("target") \
  .merge(source_df.alias("source"), "target.id = source.id") \
  .whenMatchedUpdateAll() \
  .whenNotMatchedInsertAll() \
  .execute()

Approach 2: Use the OVERWRITE Mode

Another option is to use the ’OVERWRITE’ mode instead of ’APPLY_Changes’. The ’OVERWRITE’ mode replaces the entire table with the new data, which can help maintain the data lineage.

Here's an example of how you can use the ’OVERWRITE’ mode to load data into a silver table:

from delta.tables import DeltaTable

# Read the source data
source_df = spark.read.format("delta").load("path/to/source/table")

# Write the source data to the silver table in overwrite mode
source_df.write.format("delta").mode("overwrite").save("path/to/silver/table")

By using the 'OVERWRITE' mode, you can ensure that the data lineage is preserved, and you can still see the lineage in the catalog after the update.Both of these approaches should help you maintain the data lineage even after using the 'APPLY_Changes' feature.

If you have any further questions or need additional assistance, feel free to ask.

Gareema · ‎07-08-2024

@Kaniz_Fatma Thank you for your response. However, in both approaches we are avoiding the 'APPLY_Changes' feature. In reality I want to utilise that feature to have the latest record based on the 'Sequence Key'.

With merge or overwrite I will not be able to have the flexibility of getting just the last updated row based on sequence.