cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Table lineage visibility in Databricks

sms101
New Contributor

I’ve observed differences in table lineage visibility in Databricks based on how data is referenced, and I would like to confirm if this is the expected behavior.

1. When referencing a Delta table as the source in a query (e.g., df = spark.table("catalog_test.schema.dinner")), the table lineage correctly tracks the source table under the lineage section.

2. However, when referencing a file path (e.g., df1 = spark.read.format("delta").load("s3://path/")), the lineage does not track any source table names, as the source is a file location rather than a registered table.

Is it correct that lineage tracking in Databricks primarily works at the table level and won’t capture lineage from data sources referenced by file paths? If so, are there recommended best practices for maintaining lineage visibility when using file locations as sources?

1 REPLY 1

Brahmareddy
Valued Contributor II

Hi @sms101,

How are you doing today?

As per my understanding, It is correct that lineage tracking in Databricks works primarily at the table level, meaning when you reference a Delta table directly, the lineage is properly captured. However, when you use file paths as data sources, Databricks does not track lineage since it sees the source as just a file location, not a registered table. For better lineage visibility, consider registering your data sources as Delta tables before referencing them in queries. This will help ensure the lineage is consistently tracked. Additionally, maintaining consistent use of catalog tables instead of direct file paths is a recommended practice to preserve full lineage tracking across your workflow.

Please let me know if it works and have a good day.

Regards,

Brahma

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group