How to prevent escaping tables updated infrequently from the Unity Catalog Data Lineage?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2022 06:07 AM
Using Unity Catalog as a unified metastore for Databricks we are able to track the data lineage of tables.
The lineage is going to be maintained for 30 days - this is described in the official documentation:
- Because lineage is computed on a 30-day rolling window, lineage is not displayed for tables that have not been modified within the last 30 days.
If a table is not updated for 30 days, this means the data lineage will no longer be visible for that specific table. The lineage will become visible again once the table gets updated.
I try to find the possibility to avoid this limitation for use cases that need longer retention (e.g. quarterly or annual reporting).
What I have tried already:
I checked how to change 'updated at' in UC after OPTIMIZE operation and got the following result:
- The operation creates a new version of the table in the table history
- However 'updated at' at the 'Details' tab in Unity Catalog does not change
I think it is related with: optimize and similar operations change the structure of files but not the data.
The best solution for these specific cases would be using the API for gathering data and visualization inside of Unity Catalog. However, Unity Catalog doesn’t allow to insert the lineage into the Unity Catalog manually.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-17-2022 11:16 PM
This is really interesting , I have to explore this more
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2023 03:08 PM
@Natalia Lebedeva did you discover any other possible workaround?

