cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Lineage not visible for table created in DLT

user1234567899
New Contributor II

Hello,

I've been struggling for two days with missing lineage information for the silver layer table, and I'm unsure what I'm doing incorrectly.

I have a DLT pipeline with DPM public preview enabled. Data is ingested from an S3 bucket into the bronze table. After that, I have defined some expectations for the silver table. Additionally, there is a quarantine table where records that do not meet the expectations for the silver table are placed. The silver table is defined to use SCD1. Here’s how the silver table is configured:

dlt.create_target_table(
    name="x.y.z",
    comment="Some comment",
    table_properties={
        "quality": "silver"},
    expect_all_or_drop={"exp": "x>1"}
)

dlt.apply_changes(
    target="x.y.z",
    source="x.x.z",
    keys=["id"],
    sequence_by=col("cdc_timestamp"),
    apply_as_deletes=expr("Op = 'D'"),
    except_column_list=["Op", "cdc_timestamp"],
    stored_as_scd_type=1
)

The issue is that I am unable to see any lineage information for "x.y.z" (silver) in the Unity Catalog UI. Both "x.x.z" (bronze) and the quarantine table "x.y.q" display lineage correctly, and the quarantine table is located in the same schema as the silver table.

Is there a DLT limitation preventing it from capturing lineage when using apply_changes, or am I overlooking something?

For example:

id_ = random.randint(1, 10000)
dlt.table(
            name=f"x.x.z_{id_}",
            comment="Comment",
            table_properties={
                "quality": "bronze"
            }
        )
def raw_cdc_data():
    return (
        spark.readStream.format("cloudFiles")
        .option("cloudFiles.format", "csv")
        .option("sep", ",")
        .load("s3://s3-bucket/dms/web_page/users/"))

dlt.create_streaming_table(
        name=f'x.y.z_{id_}'
    )
dlt.apply_changes(
        target=f'x.y.z_{id_}',
        source=f"x.x.z_{id_}",
        keys=["id"],
        sequence_by=col("cdc_timestamp"),
        apply_as_deletes=expr("Op = 'D'"), 
        except_column_list=["Op", "cdc_timestamp", "_rescued_data"],
        stored_as_scd_type="1"
    )

Lineage for x.y.z_{id_} not available, but if create_streaming_table and apply_changes replaced with:

@dlt.table(
    name=f"x.y.z_{id_}",
)
def users_dpm_3():
    return spark.read.table(f"x.x.z_{id_}")

Lineage is shown for x.y.z_{id_}

Thanks a lot

1 REPLY 1

Nik_Vanderhoof
Contributor

Hi! Have you seen this article from Databricks? https://docs.databricks.com/aws/en/dlt/unity-catalog

It mentions that the ability for Delta Live Tables to register streaming tables and views in unity catalog is now in public preview, but that pipelines created before February 5th 2025 may still be using a legacy publishing mode, where views and streaming tables would not register lineage in unity catalog.

I'm not sure from your example if this applies to your situation, but it's worth checking out!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now