<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Table-Model Lineage for models without online Feature Lookups in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/table-model-lineage-for-models-without-online-feature-lookups/m-p/96606#M3749</link>
    <description>&lt;P&gt;Hi community,&lt;/P&gt;&lt;P&gt;I am looking for the recommended way to achieve table-model lineage in Unity Catalog for models that don't use Feature Lookups but only offline features.&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I use &lt;EM&gt;FeatureEngineeringClient.create_training_se&lt;/EM&gt;t with &lt;EM&gt;feature_lookups&lt;/EM&gt; + mlflow experiment tracking, this works well and the respective feature stores show up in the model lineage. However, I haven't found a way to use offline features only.&lt;/P&gt;&lt;P&gt;Tracking an mlflow model without &lt;EM&gt;FeatureEngineeringClient.create_training_set&lt;/EM&gt; works but then the lineage doesn't show up in Unity. Passing an empty list as the feature_lookups results in&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt; WARNING databricks.ml_features._catalog_client._catalog_client_helper: Failed to record consumer in the catalog. Exception: {'error_code': 'NOT_FOUND', 'message': 'Workspace Feature Store has been deprecated in the current workspace. Databricks recommends using Feature Engineering in Unity Catalog.&lt;/LI-CODE&gt;&lt;P&gt;and the lineage won't show up either. This is particularly weird since there is no such warning when I pass actual FeatureLookups instead of the empty list.&lt;/P&gt;&lt;P&gt;Thanks for any help&lt;/P&gt;&lt;P&gt;#featurestore #mlflow&lt;/P&gt;</description>
    <pubDate>Tue, 29 Oct 2024 08:01:10 GMT</pubDate>
    <dc:creator>ssequ</dc:creator>
    <dc:date>2024-10-29T08:01:10Z</dc:date>
    <item>
      <title>Table-Model Lineage for models without online Feature Lookups</title>
      <link>https://community.databricks.com/t5/machine-learning/table-model-lineage-for-models-without-online-feature-lookups/m-p/96606#M3749</link>
      <description>&lt;P&gt;Hi community,&lt;/P&gt;&lt;P&gt;I am looking for the recommended way to achieve table-model lineage in Unity Catalog for models that don't use Feature Lookups but only offline features.&amp;nbsp;&lt;/P&gt;&lt;P&gt;When I use &lt;EM&gt;FeatureEngineeringClient.create_training_se&lt;/EM&gt;t with &lt;EM&gt;feature_lookups&lt;/EM&gt; + mlflow experiment tracking, this works well and the respective feature stores show up in the model lineage. However, I haven't found a way to use offline features only.&lt;/P&gt;&lt;P&gt;Tracking an mlflow model without &lt;EM&gt;FeatureEngineeringClient.create_training_set&lt;/EM&gt; works but then the lineage doesn't show up in Unity. Passing an empty list as the feature_lookups results in&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt; WARNING databricks.ml_features._catalog_client._catalog_client_helper: Failed to record consumer in the catalog. Exception: {'error_code': 'NOT_FOUND', 'message': 'Workspace Feature Store has been deprecated in the current workspace. Databricks recommends using Feature Engineering in Unity Catalog.&lt;/LI-CODE&gt;&lt;P&gt;and the lineage won't show up either. This is particularly weird since there is no such warning when I pass actual FeatureLookups instead of the empty list.&lt;/P&gt;&lt;P&gt;Thanks for any help&lt;/P&gt;&lt;P&gt;#featurestore #mlflow&lt;/P&gt;</description>
      <pubDate>Tue, 29 Oct 2024 08:01:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/table-model-lineage-for-models-without-online-feature-lookups/m-p/96606#M3749</guid>
      <dc:creator>ssequ</dc:creator>
      <dc:date>2024-10-29T08:01:10Z</dc:date>
    </item>
    <item>
      <title>Re: Table-Model Lineage for models without online Feature Lookups</title>
      <link>https://community.databricks.com/t5/machine-learning/table-model-lineage-for-models-without-online-feature-lookups/m-p/137259#M4397</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106753"&gt;@ssequ&lt;/a&gt;&amp;nbsp; sorry this fell through the cracks but I have some ideas for you to consider.&lt;/P&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;You can get Unity Catalog table→model lineage without Feature Lookups by logging the training datasets to MLflow and registering the model in Unity Catalog.&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Recommended approach (offline features only)&lt;/H3&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Use &lt;STRONG&gt;MLflow dataset logging&lt;/STRONG&gt; to record the UC tables you trained/evaluated on, then register the model to Unity Catalog:&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Ensure you’re on &lt;STRONG&gt;MLflow ≥ 2.11&lt;/STRONG&gt;; table→model lineage uses mlflow.log_input and is supported from 2.11 onward.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Load your training data from &lt;STRONG&gt;UC tables&lt;/STRONG&gt; and create MLflow dataset objects (for example with mlflow.data.load_delta) so lineage can resolve to UC assets.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Call &lt;STRONG&gt;mlflow.log_input(dataset, context="training")&lt;/STRONG&gt; for each upstream table or for a snapshot table you create for training, then &lt;STRONG&gt;log and register&lt;/STRONG&gt; your model to UC; lineage will appear on the model version’s Lineage tab in Catalog Explorer.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Include a &lt;STRONG&gt;model signature&lt;/STRONG&gt; (either provide it or let MLflow infer it via input_example) because UC requires model versions to have signatures when registering.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 class="paragraph"&gt;Minimal example&lt;/H4&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;```python import mlflow from sklearn.ensemble import RandomForestClassifier&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;# 1) Load UC table(s) used for training and create MLflow dataset(s) dataset = mlflow.data.load_delta(table_name="prod.ml_team.features_customer_churn", version="42") pdf = dataset.df.toPandas() X = pdf.drop(columns=["label"]) y = pdf["label"]&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;with mlflow.start_run():&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;# 2) Train clf = RandomForestClassifier(max_depth=7, n_estimators=200) clf.fit(X, y)&lt;/DIV&gt;
&lt;PRE&gt;&lt;CODE&gt;# 3) Log the training dataset for lineage
mlflow.log_input(dataset, context="training")&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;# 4) Log + register the model in Unity Catalog (three-level name) input_example = X.iloc[[0]] mlflow.sklearn.log_model( sk_model=clf, name="model", input_example=input_example, registered_model_name="prod.ml_team.churn_rf" ) ```&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;H3 class="paragraph"&gt;Notes on your current behavior&lt;/H3&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;The warning you saw with an empty feature_lookups list is the WS Feature Store deprecation path; passing actual &lt;STRONG&gt;FeatureLookups&lt;/STRONG&gt; uses the Feature Engineering in UC path that auto-captures lineage. If you don’t want Feature Lookups, skip FeatureEngineeringClient and use mlflow.log_input to capture lineage from offline UC tables.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 class="paragraph"&gt;Variations and best practices&lt;/H3&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;UL&gt;
&lt;LI class="paragraph"&gt;If your training data is built from multiple offline tables, log each source: - mlflow.log_input(mlflow.data.load_delta(table_name="catalog.schema.tableA", version="..."), "training") - mlflow.log_input(mlflow.data.load_delta(table_name="catalog.schema.tableB", version="..."), "training")&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;If you train on an ephemeral DataFrame (not a UC table), persist a &lt;STRONG&gt;snapshot&lt;/STRONG&gt; to UC first (for reproducibility and lineage), then load and log that snapshot with a version number.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;You can also log &lt;STRONG&gt;evaluation&lt;/STRONG&gt; datasets:
&lt;UL&gt;
&lt;LI&gt;mlflow.log_input(dataset_eval, context="evaluation")&lt;/LI&gt;
&lt;/UL&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV class="paragraph"&gt;Make sure your MLflow client is configured to target &lt;STRONG&gt;UC&lt;/STRONG&gt; (MLflow 3 defaults to databricks-uc, or set registry URI explicitly) and that you use the three‑level registered_model_name when logging.&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;DIV class="paragraph"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="paragraph"&gt;Hope this helps, Louis.&lt;/DIV&gt;</description>
      <pubDate>Sat, 01 Nov 2025 19:42:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/table-model-lineage-for-models-without-online-feature-lookups/m-p/137259#M4397</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2025-11-01T19:42:39Z</dc:date>
    </item>
  </channel>
</rss>

