Data lineage on views
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2023 12:07 AM
I do not know if this is intended behavior of data lineage but for me it is weird.
When I create a view based on two tables the data lineage upstream looks correct. But when I replace the view to only use one of the tables, then data lineage upstream still telling me that the view is based on the two tables.
When I read the documentation, it looks like data lineage is based on history for the view for the last 30 days, but a view does not work with history the same way as a table, so to me this is weird.
Can anyone give some more details on this strange behavior?
This is my script for testing this:
/* create 2 tables */
create table if not exists test.test.table_1
(id int not null
,name string);
insert into test.test.table_1
values (1, "test 1");
create table if not exists test.test.table_2
(id int not null
,name string);
insert into test.test.table_2
values (1, "test 2");
/* create a view depending on the to tables */
create or replace view test.test.view_1 as
select
a.id
, b.name
from
test.test.table_1 as a
inner join test.test.table_2 as b
on a.id=b.id;
/*
Data lineage for view is correct upstream for the view are table_1 and table_2
-----------------------------------------------------------------------------------------------------------------------------------
*/
/* now change the view to depend only on table_1 */
alter view test.test.view_1 as
select
a.id
, a.name
from
test.test.table_1 as a;
/*
Data lineage has not changed! - we still have both tables in upstream for the view
*/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2023 02:54 AM
Hi @Henrik Good day!
I tried to repro this issue internally and I was able to repro this, this seems like limitation that:
- Unity Catalog captures lineage to the column level as much as possible. However, there are some cases where column-level lineage cannot be captured.
When I checked view defination under details after altering the view to depend only on table_1. I found that view defination was updated with latest view but it's not getting reflected in lineage graph.
https://docs.databricks.com/data-governance/unity-catalog/data-lineage.html#limitations
Regards,
Vinay M R
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2023 11:21 PM
After some thoughts, i have come to this conclusion:
Data lineage on views is working as one should expect. I strongly recommend that this feature is redesigned so it shows the result of the lastest view.

