cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

system.access.table_lineage table missing data

aranjan99
New Contributor III

I am using the system.access.table_lineage table  to figure out the tables accessed by sql queries and the corresponding SQL queries. However I am noticing this table missing data or values very often.

For eg for sql queries executed by our DBT jobs, the table system.access.table_lineage has an entry but the entity run id (which should be the query id in this case) is NULL even though the queries history API and the UI have the corresponding queries. Why is the entity run id not populated in such case?

I am also noticing this table missing entries for some reads entirely. Our DBT jobs read from a few tables every hour once, but system.access.table_lineage table often only has 20-22 entries for those tables as opposed to 24 even though the queries history API and the UI have all the corresponding 24 queries.

This looks like a bug to me, can someone help on why would this be the case?

 

4 REPLIES 4

@Retired_mod I have access to the system.access.table_lineage  table. I can see some data in there. My questions is specifically asking about in correct data and missing data in this table.

jacovangelder
Honored Contributor

Is all your ETL querying/referencing the full table name (i.e. catalog.schema.table)? If you query delta files for example, metadata for data lineage will not be captured. 

Yes it is referencing full table name and these are all SQL tables and not query delta files. 
If I run the exact same query via a Databricks jobs, the entity run ids are populated. But if I run them via DBT jobs the entity run ids are always NULL

goldenmountain
New Contributor II

@aranjan99 did you ever get an answer or conclusion to the limitations of Unity Catalog in regards to tracking access via SQL?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group