cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

How is model drift calculated when the baseline table has no timestamp column?

MohsenJ
Contributor

I try to understand how Databricks computes the model drift when the baseline table is available. What I understood from the documentation is Databricks processes both the primary and the baseline tables according to the specified granularities in the monitor, store this result in the profile metric table, and then use a specific measure such as KS test to compare the distribution between the values of both tables in a given window.

What I can't figure out is how it works if my baseline table has no timestamp. This is the only information I found in the documentation which is very vague: 

.... The exception is the timestamp column for tables used with time series or inference profiles. If columns are missing in either the primary table or the baseline table, monitoring uses best-effort heuristics to compute the output metrics


For example, when I use the model serving endpoint, the timestamp column of my primary table corresponds to the time when a client calls the endpoint to compute the prediction for some query. Now, imagine I want to use my validation dataset as the baseline table. How does Databricks match the rows of the two tables?


0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group