Hello,
I've been struggling for two days with missing lineage information for the silver layer table, and I'm unsure what I'm doing incorrectly.
I have a DLT pipeline with DPM public preview enabled. Data is ingested from an S3 bucket into the bronze table. After that, I have defined some expectations for the silver table. Additionally, there is a quarantine table where records that do not meet the expectations for the silver table are placed. The silver table is defined to use SCD1. Here’s how the silver table is configured:
dlt.create_target_table(
name="x.y.z",
comment="Some comment",
table_properties={
"quality": "silver"},
expect_all_or_drop={"exp": "x>1"}
)
dlt.apply_changes(
target="x.y.z",
source="x.x.z",
keys=["id"],
sequence_by=col("cdc_timestamp"),
apply_as_deletes=expr("Op = 'D'"),
except_column_list=["Op", "cdc_timestamp"],
stored_as_scd_type=1
)
The issue is that I am unable to see any lineage information for "x.y.z" (silver) in the Unity Catalog UI. Both "x.x.z" (bronze) and the quarantine table "x.y.q" display lineage correctly, and the quarantine table is located in the same schema as the silver table.
Is there a DLT limitation preventing it from capturing lineage when using apply_changes, or am I overlooking something?
For example:
id_ = random.randint(1, 10000)
dlt.table(
name=f"x.x.z_{id_}",
comment="Comment",
table_properties={
"quality": "bronze"
}
)
def raw_cdc_data():
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("sep", ",")
.load("s3://s3-bucket/dms/web_page/users/"))
dlt.create_streaming_table(
name=f'x.y.z_{id_}'
)
dlt.apply_changes(
target=f'x.y.z_{id_}',
source=f"x.x.z_{id_}",
keys=["id"],
sequence_by=col("cdc_timestamp"),
apply_as_deletes=expr("Op = 'D'"),
except_column_list=["Op", "cdc_timestamp", "_rescued_data"],
stored_as_scd_type="1"
)
Lineage for x.y.z_{id_} not available, but if create_streaming_table and apply_changes replaced with:
@dlt.table(
name=f"x.y.z_{id_}",
)
def users_dpm_3():
return spark.read.table(f"x.x.z_{id_}")
Lineage is shown for x.y.z_{id_}
Thanks a lot