cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

system.access.table_lineage - source and target table meanings

eriodega
Contributor

I've been using the system.access.table_lineage table, and I'm trying to understand when the source and target tables are defined.
For example, picking a specific job run and looking at the lineage:

select
source_type,
source_table_full_name,
target_type,
target_table_full_name
from system.access.table_lineage
where 1=1
and entity_id = 279912916130857 and entity_run_id = 335708887601361
order by event_time DESC
LIMIT 1000;

I see a few scenarios:

  1. source_type = null and target_type =TABLE (I think this is a write to the table)
  2. source_type = TABLE, target_type = null (I think this is a read of a table)
  3. source_type = TABLE (table X), target_type = TABLE (table Y) - I don't know what this is
  4. source_type = VIEW, target_type = TABLE - I don't know what this is
  5. source_type = TABLE, target_type = VIEW (I think this is a read of the "target" view, indicating it then reads from the "source" of the table)

I looked over the documentation in

DESCRIBE TABLE EXTENDED system.access.table_lineage

but it did not help with my understanding.

2 REPLIES 2

Sidhant07
Databricks Employee
Databricks Employee

Hi Eriodega,

1. **source_type = null and target_type = TABLE**:
- This indicates a write operation to the table. The source is null because there is no source table involved in the operation, only a target table that is being written to.

2. **source_type = TABLE, target_type = null**:
- This indicates a read operation from the table. The target is null because there is no target table involved in the operation, only a source table that is being read from.

3. **source_type = TABLE (table X), target_type = TABLE (table Y)**:
- This scenario represents a transformation or data movement operation where data is read from one table (table X) and written to another table (table Y). This could be part of an ETL (Extract, Transform, Load) process or a similar data pipeline.

4. **source_type = VIEW, target_type = TABLE**:
- This indicates that data is being read from a view and written to a table. The view acts as the source, and the table is the target where the data is being written.

5. **source_type = TABLE, target_type = VIEW**:
- This scenario suggests that the view is being created or updated based on the data read from the table. The table is the source, and the view is the target that is being defined or modified.

 

eriodega
Contributor

@Sidhant07thanks for the answer, I think it is good, but I am questioning scenario #5 (source=table,target=view).
I'm looking at some examples in our table_lineage, and we aren't modifying the view or creating the view from within a job. I think scenario #5 is more likely "queried a view X and the query that defines view X has queried table Y".
Or does "target" always imply some sort of write operation?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group