cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Integrating Unity Catalog ML Models into Data Lineage

23940829381
New Contributor II

I saw a really nice article (https://www.databricks.com/blog/announcing-public-preview-volumes-databricks-unity-catalog) on the incorporation of various elements of data lineage within Unity Catalog. In my own exploration, I've been able to replicate graphs linking volumes to tables and tables to tables, but I have not been able to achieve the attached example, where a model is tied into the lineage graph.

When I register models in Unity Catalog and perform transformations with them, no linkage shows up to the model itself ever. Further, if I look at a registered model within Unity Catalog, it does show the graph element for the model itself as in the attached photo, but it is linked neither to training/ingest or predictive output tables.

Has anyone been able to replicate the attached image? Thank you!

 

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @23940829381Thank you for sharing your interest in data lineage within Unity Catalog! Itโ€™s a powerful feature that allows you to track the flow of data across various elements in your Databricks environment.

Letโ€™s delve into this further:

  1. Unity Catalog and Data Lineage:

    • Unity Catalog captures runtime data lineage across queries executed on Azure Databricks. It supports all languages and captures lineage down to the column level. This lineage data includes information related to notebooks, workflows, and dashboards associated with the queries.
    • You can visualize lineage using Catalog Explorer in near real-time and retrieve it via the Databrick...1.
    • Lineage is aggregated across all workspaces attached to a Unity Catalog metastore, meaning that lineage captured in one workspace is visible in any other workspace sharing the same metastore.
    • Users must have the correct permissions to view lineage data, and lineage data is retained for one year.
  2. Model Lineage:

    • While Unity Catalog captures lineage for tables, views, and columns, it also extends to models.
    • Models are tracked and logged using MLflow and registered in Unity Catalog.
    • The lineage for models provides insights into how they are used, including their upstream and downstream dependencies.
    • This is particularly valuable for understanding the impact of model predictions and ensuring governance, especially for sensitive data like PII or GDPR-related information.
  3. Challenges with Model Linkage:

    • Youโ€™ve pointed out an interesting observation: when you register models in Unity Catalog and perform transformations, the linkage to the model itself doesnโ€™t appear.
    • Itโ€™s essential to note that lineage is computed on a 1-year rolling window. If a job or query reads data from one table (e.g., training data) and writes to another (e.g., predictive output), the link between them is displayed for only one year.
    • If the model was registered more than a year ago, it might not appear in the lineage graph. Ensure that your model registration and transformations fall within this window.

 

!Unity Catalog Model Lineage 2

 

Hi @Kaniz_Fatma -- thank you for your response! I wanted to confirm that we had been able to see the lineage graph no problem for all non-ML volume/table transformations. In addition each time we retrained models and registered a best performer, we could see its versioning lineage in Unity Catalog and the versions themselves were each linked back to MLflow runs as you pointed out. All of the models and data we were using are far less than a year old, so based on the third point, I am still surprised that they didn't show up in the lineage. Do you have any other suggestions? Thank you!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group