cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Lineage between model and source code breaks on movement of source notebook. How to rectify it?

Maverick1
Valued Contributor II

If there is a registered model and it is linked with a notebook, then the lineage breaks if you move the notebook to a different path or even pull/upload a new version of the notebook.

This is not good because when someone doing its development/testing they usually do it in a messy way but if you need your code in production then it means that either

  1. You need to move your code and then again re-train to generate the same model and then perform its movement to higher env. so that the lineage is maintained.... or
  2. You need to keep the dev source code notebook, that too on the same path, where it had been created for dev usage.
1 ACCEPTED SOLUTION

Accepted Solutions

Maverick1
Valued Contributor II

Hi @Jose Gonzalez​ ,

The issue still persists. I believe this is related to the workspace version that we are using.

I have ran through the exact steps in E2 workspace version and the issue is somehow resolved there.

View solution in original post

10 REPLIES 10

-werners-
Esteemed Contributor III

Hi @Saurabh Verma​ ,

as in your other topic, I suggest looking into MLflow as this is designed to handle all these issues.

Maverick1
Valued Contributor II

@Werner Stinckens​ : Hi Werners,

Thanks for the reply. But this issue is happening in MLFLow tracking itself. I wanted to know if there is a way to mitigate it or not.

-werners-
Esteemed Contributor III

Ok I see,

do not have a clear answer for that.

But there was a session at spark summit about this (well, more CI/CD related but it might give ideas):

https://databricks.com/session_na20/productionalizing-models-through-ci-cd-design-with-mlflow

The whole ci/cd shebang might be too much for your needs but maybe you can pick some parts which are useful?

Anonymous
Not applicable

@Werner Stinckens​ - Thank you so much!

Jin1
New Contributor II

Hi Maverick,

I'm unable to reproduce the issue you mentioned. Where is your notebook located? Is it stored in a git-versioned Repo directory (accessed via "Repos" instead of "Workspace" icon on the navigation bar)?

Maverick1
Valued Contributor II

@Jin Zhang​ : For reproducing this issue:

  1. create a notebook in your workspace account and generate a model from it.
  2. If you go to the model stats page you can see its lineage to the original notebook.
  3. Create a folder in your workspace and move your notebook to that folder. Now move it back to where it was before.
  4. Go back to the model stats page and click on “source” link which represents the original notebook lineage. It will show the error as “notebook not found” although it is on the same path where it is supposed to be.

sean_owen
Honored Contributor II
Honored Contributor II

I also cannot reproduce this, with these exact steps (I think). After moving the notebook and moving it back, the link to it (and link to the revision) still works as expected. You are using MLflow built in to Databricks right?

Maverick1
Valued Contributor II

@Sean Owen​ : Yes. Managed MLFlow on databricks.

Hi @Saurabh Verma​ ,

Did Sean's reply help you to solve this issue or your still are waiting for a solution to unblock you?

Maverick1
Valued Contributor II

Hi @Jose Gonzalez​ ,

The issue still persists. I believe this is related to the workspace version that we are using.

I have ran through the exact steps in E2 workspace version and the issue is somehow resolved there.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.