cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Lakehouse Monitoring of Inference Table

grajee
New Contributor II

All,

I'm trying to setup a lakehouse monitoring process for the WineQuality model that is widely available. While setting up the Serving Endpoint, I enabled "Inference Table" option for which the inference table was created automatically. The columns in the winequality_payload table are as follows:

  1. client_request_id
  2. databricks_request_id
  3. date
  4. timestamp_ms
  5. status_code
  6. execution_time_ms
  7. request
  8. response
  9. sampling_fraction
  10. request_metadata (MAP type)

The request_metadata contains "model_name","endpoint_name" & "model_version".

While configuring the monitor against the inference table, I selected request_metadata as the model_id column but it is erroring out with the below error:

The given `DataMonitorInfo` is invalid for the following reason(s): - For table `dev_tst_mls.winequality_uc.winequality_payload`: The specified `model_id_col` (`request_metadata`) must be a groupable column, but instead it is a MAP type. Please check that all referenced columns exist in the table(s) and have compatible column type.

As you can see, there is no other column that I can pick for the model_id column. Why am I getting this error and what alternatives do I have?

Now, based on Databricks Assistant, I created a view adding the model_name and model_version and used this view to create the monitor. Things go through but the dashboard shows no data at all despite my making several scoring attempts.

I tried troubleshooting and found that the datetime in the window column of the profile_metrics table is way off:

profile_metrics table --> Column window  
start: "+057064-08-22T07:05:00.000Z" 
end: "+057064-08-22T07:10:00.000Z"

As you can see, I tried it today (02/03/2025), but the date is light years ahead!!.

Based on the link, the date column is "The UTC date on which the model serving request was received." and the timestamp_ms column is "The timestamp in epoch milliseconds on when the model serving request was received."

I checked the date column in the inference table and it correctly show 2025-02-03 but the timestamp_ms shows values like 1738620594270 which is "2024-12-31 23:09:54.270"

Am I doing something wrong? Has anyone experienced this before?

Thanks,

grajee

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Hello @grajee ,  I can see you're dealing with two separate issues here. Let me address both:

Issue 1: The model_id column (request_metadata MAP type)

You're correct that request_metadata is a MAP type and can't be directly used as the model_id column in Lakehouse Monitoring. Your approach of creating a view that extracts model_name and model_version from the request_metadata MAP is the right solution. You can create a view like this:

```sql
CREATE OR REPLACE VIEW your_catalog.your_schema.winequality_inference_view AS
SELECT
*,
request_metadata['model_name'] AS model_name,
request_metadata['model_version'] AS model_version,
CONCAT(request_metadata['model_name'], '_', request_metadata['model_version']) AS model_id
FROM your_catalog.your_schema.winequality_payload
```

Then create the monitor on this view using the model_id column.

Issue 2: The timestamp and profile_metrics date problem

I noticed something important in your post. The timestamp_ms value 1738620594270 actually converts to **2025-02-03 22:09:54** (not 2024-12-31 as you mentioned). So your timestamp_ms column is correct and matches your date column.

The real issue is that the profile_metrics table is showing dates in year 57064, which suggests Lakehouse Monitoring may be misinterpreting your timestamp_ms column. This could happen if:

1. The monitoring is treating timestamp_ms as seconds instead of milliseconds
2. There's a unit mismatch in how the timestamp column is being processed
3. The timestamp column being used for windowing isn't correctly specified

Troubleshooting steps:

1. Verify your monitor configuration explicitly specifies timestamp_ms as the timestamp column and that it's correctly formatted as epoch milliseconds (LONG type)
2. Check if there are any timezone configuration issues in your monitor setup
3. Try recreating the monitor and ensure you're using the InferenceLog profile type (not TimeSeries) for inference tables
4. Confirm your inference table schema matches the expected format with timestamp_ms as a LONG type

You might also want to run a query directly on your inference table to validate the timestamp_ms values are reasonable:

```sql
SELECT
date,
timestamp_ms,
FROM_UNIXTIME(timestamp_ms/1000) as converted_timestamp
FROM your_catalog.your_schema.winequality_payload
LIMIT 10
```

Hope this helps, Louis.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now