Using Unity Catalog as a unified metastore for Databricks we are able to track the data lineage of tables.
The lineage is going to be maintained for 30 days - this is described in the official documentation:
- Because lineage is computed on a 30-day rolling window, lineage is not displayed for tables that have not been modified within the last 30 days.
If a table is not updated for 30 days, this means the data lineage will no longer be visible for that specific table. The lineage will become visible again once the table gets updated.
I try to find the possibility to avoid this limitation for use cases that need longer retention (e.g. quarterly or annual reporting).
What I have tried already:
I checked how to change 'updated at' in UC after OPTIMIZE operation and got the following result:
- The operation creates a new version of the table in the table history
- However 'updated at' at the 'Details' tab in Unity Catalog does not change
I think it is related with: optimize and similar operations change the structure of files but not the data.
The best solution for these specific cases would be using the API for gathering data and visualization inside of Unity Catalog. However, Unity Catalog doesn’t allow to insert the lineage into the Unity Catalog manually.