I do not know if this is intended behavior of data lineage but for me it is weird.
When I create a view based on two tables the data lineage upstream looks correct. But when I replace the view to only use one of the tables, then data lineage upstream still telling me that the view is based on the two tables.
When I read the documentation, it looks like data lineage is based on history for the view for the last 30 days, but a view does not work with history the same way as a table, so to me this is weird.
Can anyone give some more details on this strange behavior?
This is my script for testing this:
/* create 2 tables */
create table if not exists test.test.table_1
(id int not null
,name string);
insert into test.test.table_1
values (1, "test 1");
create table if not exists test.test.table_2
(id int not null
,name string);
insert into test.test.table_2
values (1, "test 2");
/* create a view depending on the to tables */
create or replace view test.test.view_1 as
select
a.id
, b.name
from
test.test.table_1 as a
inner join test.test.table_2 as b
on a.id=b.id;
/*
Data lineage for view is correct upstream for the view are table_1 and table_2
-----------------------------------------------------------------------------------------------------------------------------------
*/
/* now change the view to depend only on table_1 */
alter view test.test.view_1 as
select
a.id
, a.name
from
test.test.table_1 as a;
/*
Data lineage has not changed! - we still have both tables in upstream for the view
*/