Databricks Community

user1234567899 · 4 weeks ago

Hello,

I've been struggling for two days with missing lineage information for the silver layer table, and I'm unsure what I'm doing incorrectly.

I have a DLT pipeline with DPM public preview enabled. Data is ingested from an S3 bucket into the bronze table. After that, I have defined some expectations for the silver table. Additionally, there is a quarantine table where records that do not meet the expectations for the silver table are placed. The silver table is defined to use SCD1. Here’s how the silver table is configured:

dlt.create_target_table(
    name="x.y.z",
    comment="Some comment",
    table_properties={
        "quality": "silver"},
    expect_all_or_drop={"exp": "x>1"}
)

dlt.apply_changes(
    target="x.y.z",
    source="x.x.z",
    keys=["id"],
    sequence_by=col("cdc_timestamp"),
    apply_as_deletes=expr("Op = 'D'"),
    except_column_list=["Op", "cdc_timestamp"],
    stored_as_scd_type=1
)

The issue is that I am unable to see any lineage information for "x.y.z" (silver) in the Unity Catalog UI. Both "x.x.z" (bronze) and the quarantine table "x.y.q" display lineage correctly, and the quarantine table is located in the same schema as the silver table.

Is there a DLT limitation preventing it from capturing lineage when using apply_changes, or am I overlooking something?

For example:

id_ = random.randint(1, 10000)
dlt.table(
            name=f"x.x.z_{id_}",
            comment="Comment",
            table_properties={
                "quality": "bronze"
            }
        )
def raw_cdc_data():
    return (
        spark.readStream.format("cloudFiles")
        .option("cloudFiles.format", "csv")
        .option("sep", ",")
        .load("s3://s3-bucket/dms/web_page/users/"))

dlt.create_streaming_table(
        name=f'x.y.z_{id_}'
    )
dlt.apply_changes(
        target=f'x.y.z_{id_}',
        source=f"x.x.z_{id_}",
        keys=["id"],
        sequence_by=col("cdc_timestamp"),
        apply_as_deletes=expr("Op = 'D'"), 
        except_column_list=["Op", "cdc_timestamp", "_rescued_data"],
        stored_as_scd_type="1"
    )

Lineage for x.y.z_{id_} not available, but if create_streaming_table and apply_changes replaced with:

@dlt.table(
    name=f"x.y.z_{id_}",
)
def users_dpm_3():
    return spark.read.table(f"x.x.z_{id_}")

Lineage is shown for x.y.z_{id_}

Thanks a lot

Nik_Vanderhoof · 3 weeks ago

Hi! Have you seen this article from Databricks? https://docs.databricks.com/aws/en/dlt/unity-catalog

It mentions that the ability for Delta Live Tables to register streaming tables and views in unity catalog is now in public preview, but that pipelines created before February 5th 2025 may still be using a legacy publishing mode, where views and streaming tables would not register lineage in unity catalog.

I'm not sure from your example if this applies to your situation, but it's worth checking out!

Databricks Community

Lineage not visible for table created in DLT

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April