How is model drift calculated when the baseline table has no timestamp column?

Community Platform Discussions

Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.

I try to understand how Databricks computes the model drift when the baseline table is available. What I understood from the documentation is Databricks processes both the primary and the baseline tables according to the specified granularities in the monitor, store this result in the profile metric table, and then use a specific measure such as KS test to compare the distribution between the values of both tables in a given window.

What I can't figure out is how it works if my baseline table has no timestamp. This is the only information I found in the documentation which is very vague:

.... The exception is the timestamp column for tables used with time series or inference profiles. If columns are missing in either the primary table or the baseline table, monitoring uses best-effort heuristics to compute the output metrics

For example, when I use the model serving endpoint, the timestamp column of my primary table corresponds to the time when a client calls the endpoint to compute the prediction for some query. Now, imagine I want to use my validation dataset as the baseline table. How does Databricks match the rows of the two tables?