Databricks Community

ramyav7796 · ‎03-11-2025

Hi,

I am trying to implement lakehouse monitoring using Inference profile for my inference data that I have, I see that when I create the monitor, two tables get generated profile and drift, I wanted to understand how are these two tables generating automatically? What is it that databricks runs in background when I create a monitor?

Can you please suggest.

Louis_Frolio · ‎03-11-2025

When you create a Databricks Lakehouse Monitoring monitor with an Inference profile, the system automatically generates two metric tables: a profile metrics table and a drift metrics table. Here's how this process works:

Background Processing

When you create a monitor in Databricks Lakehouse Monitoring, the system automatically:

1. Analyzes your inference table data containing model inputs, predictions, and optional ground-truth labels
2. Computes a rich set of metrics based on the data in your table
3. Stores these metrics in two Delta tables in your specified Unity Catalog schema
4. Creates a customizable dashboard to visualize these metrics

The Two Generated Tables

Profile Metrics Table

The profile metrics table (`{output_schema}.{table_name}_profile_metrics`) contains summary statistics for each column and for each combination of time window, slice, and grouping columns. For Inference Log analysis specifically, it includes:

Basic statistics (count, nulls, avg, min, max, etc.)
Model performance metrics like accuracy, precision, recall, and F1-score for classification models
Regression metrics (MSE, RMSE, MAE, etc.) for regression models
Fairness and bias metrics when configured

Drift Metrics Table

The drift metrics table (`{output_schema}.{table_name}_drift_metrics`) contains statistics that track changes in distribution over time. It calculates:

Consecutive drift: Compares metrics between consecutive time windows
Baseline drift: Compares metrics against a baseline distribution (if provided)

How Metrics Are Computed

For an Inference profile, Databricks:

1. Processes your inference table according to the time windows you specified (hourly, daily, weekly, etc.)
2. Computes metrics for each model ID separately, allowing comparison between model versions
3. Calculates both data quality metrics (for model inputs) and model performance metrics (comparing predictions to ground truth labels, if provided)
4. Stores all these metrics in the Delta tables with appropriate metadata (time window, granularity, model ID, etc.)

The computation happens as a Databricks job that runs according to your refresh schedule, processing either the full table or incremental changes depending on your configuration.

This automated process eliminates the need for you to manually set up monitoring pipelines, allowing you to focus on analyzing the results through the generated dashboard or by querying the metric tables directly.

Also note, that you are charged for the compute costs to do the work stated above.

Cheers, Louis.