When you create a Databricks Lakehouse Monitoring monitor with an Inference profile, the system automatically generates two metric tables: a profile metrics table and a drift metrics table. Here's how this process works:
Background Processing
When you create a monitor in Databricks Lakehouse Monitoring, the system automatically:
1. Analyzes your inference table data containing model inputs, predictions, and optional ground-truth labels
2. Computes a rich set of metrics based on the data in your table
3. Stores these metrics in two Delta tables in your specified Unity Catalog schema
4. Creates a customizable dashboard to visualize these metrics
The Two Generated Tables
Profile Metrics Table
The profile metrics table (`{output_schema}.{table_name}_profile_metrics`) contains summary statistics for each column and for each combination of time window, slice, and grouping columns. For Inference Log analysis specifically, it includes:
Basic statistics (count, nulls, avg, min, max, etc.)
Model performance metrics like accuracy, precision, recall, and F1-score for classification models
Regression metrics (MSE, RMSE, MAE, etc.) for regression models
Fairness and bias metrics when configured
Drift Metrics Table
The drift metrics table (`{output_schema}.{table_name}_drift_metrics`) contains statistics that track changes in distribution over time. It calculates:
Consecutive drift: Compares metrics between consecutive time windows
Baseline drift: Compares metrics against a baseline distribution (if provided)
How Metrics Are Computed
For an Inference profile, Databricks:
1. Processes your inference table according to the time windows you specified (hourly, daily, weekly, etc.)
2. Computes metrics for each model ID separately, allowing comparison between model versions
3. Calculates both data quality metrics (for model inputs) and model performance metrics (comparing predictions to ground truth labels, if provided)
4. Stores all these metrics in the Delta tables with appropriate metadata (time window, granularity, model ID, etc.)
The computation happens as a Databricks job that runs according to your refresh schedule, processing either the full table or incremental changes depending on your configuration.
This automated process eliminates the need for you to manually set up monitoring pipelines, allowing you to focus on analyzing the results through the generated dashboard or by querying the metric tables directly.
Also note, that you are charged for the compute costs to do the work stated above.
Cheers, Louis.