Table-Model Lineage for models without online Feature Lookups
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2024 01:01 AM
Hi community,
I am looking for the recommended way to achieve table-model lineage in Unity Catalog for models that don't use Feature Lookups but only offline features.
When I use FeatureEngineeringClient.create_training_set with feature_lookups + mlflow experiment tracking, this works well and the respective feature stores show up in the model lineage. However, I haven't found a way to use offline features only.
Tracking an mlflow model without FeatureEngineeringClient.create_training_set works but then the lineage doesn't show up in Unity. Passing an empty list as the feature_lookups results in
WARNING databricks.ml_features._catalog_client._catalog_client_helper: Failed to record consumer in the catalog. Exception: {'error_code': 'NOT_FOUND', 'message': 'Workspace Feature Store has been deprecated in the current workspace. Databricks recommends using Feature Engineering in Unity Catalog.and the lineage won't show up either. This is particularly weird since there is no such warning when I pass actual FeatureLookups instead of the empty list.
Thanks for any help
#featurestore #mlflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-01-2025 12:42 PM
Hey @ssequ sorry this fell through the cracks but I have some ideas for you to consider.
Recommended approach (offline features only)
-
Ensure you’re on MLflow ≥ 2.11; table→model lineage uses mlflow.log_input and is supported from 2.11 onward.
-
Load your training data from UC tables and create MLflow dataset objects (for example with mlflow.data.load_delta) so lineage can resolve to UC assets.
-
Call mlflow.log_input(dataset, context="training") for each upstream table or for a snapshot table you create for training, then log and register your model to UC; lineage will appear on the model version’s Lineage tab in Catalog Explorer.
-
Include a model signature (either provide it or let MLflow infer it via input_example) because UC requires model versions to have signatures when registering.
Minimal example
# 3) Log the training dataset for lineage
mlflow.log_input(dataset, context="training")
Notes on your current behavior
- The warning you saw with an empty feature_lookups list is the WS Feature Store deprecation path; passing actual FeatureLookups uses the Feature Engineering in UC path that auto-captures lineage. If you don’t want Feature Lookups, skip FeatureEngineeringClient and use mlflow.log_input to capture lineage from offline UC tables.
Variations and best practices
- If your training data is built from multiple offline tables, log each source: - mlflow.log_input(mlflow.data.load_delta(table_name="catalog.schema.tableA", version="..."), "training") - mlflow.log_input(mlflow.data.load_delta(table_name="catalog.schema.tableB", version="..."), "training")
-
If you train on an ephemeral DataFrame (not a UC table), persist a snapshot to UC first (for reproducibility and lineage), then load and log that snapshot with a version number.
-
You can also log evaluation datasets:
- mlflow.log_input(dataset_eval, context="evaluation")
-
Make sure your MLflow client is configured to target UC (MLflow 3 defaults to databricks-uc, or set registry URI explicitly) and that you use the three‑level registered_model_name when logging.