cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Data lineage on views

Henrik
New Contributor III

I do not know if this is intended behavior of data lineage but for me it is weird.
When I create a view based on two tables the data lineage upstream looks correct. But when I replace the view to only use one of the tables, then data lineage upstream still telling me that the view is based on the two tables.
When I read the documentation, it looks like data lineage is based on history for the view for the last 30 days, but a view does not work with history the same way as a table, so to me this is weird.
Can anyone give some more details on this strange behavior?
This is my script for testing this:

/* create 2 tables */
create table if not exists test.test.table_1
(id int not null
,name string);

insert into test.test.table_1
values (1, "test 1");

create table if not exists test.test.table_2
(id int not null
,name string);

insert into test.test.table_2
values (1, "test 2");

/* create a view depending on the to tables */

create or replace view test.test.view_1 as
select
a.id
, b.name
from
test.test.table_1 as a
inner join test.test.table_2 as b
on a.id=b.id;

/*
Data lineage for view is correct upstream for the view are table_1 and table_2
-----------------------------------------------------------------------------------------------------------------------------------
*/

/* now change the view to depend only on table_1 */
alter view test.test.view_1 as
select
a.id
, a.name
from
test.test.table_1 as a;

/*
Data lineage has not changed! - we still have both tables in upstream for the view
*/

2 REPLIES 2

Vinay_M_R
Databricks Employee
Databricks Employee

Hi @Henrik  Good day!

I tried to repro this issue internally and I was able to repro this, this seems like limitation that:

  • Unity Catalog captures lineage to the column level as much as possible. However, there are some cases where column-level lineage cannot be captured.

When I checked view defination under details after altering the view to depend only on table_1. I found that view defination was updated with latest view but it's not getting reflected in lineage graph.

https://docs.databricks.com/data-governance/unity-catalog/data-lineage.html#limitations

Regards,

Vinay M R

Henrik
New Contributor III

After some thoughts, i have come to this conclusion:

Data lineage on views is working as one should expect. I strongly recommend that this feature is redesigned so it shows the result of the lastest view.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group