How is model drift calculated when the baseline table has no timestamp column?

Get Started Discussions

Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.

I try to understand how Databricks computes the model drift when the baseline table is available. What I understood from the documentation is Databricks processes both the primary and the baseline tables according to the specified granularities in the monitor, store this result in the profile metric table, and then use a specific measure such as KS test to compare the distribution between the values of both tables in a given window.

What I can't figure out is how it works if my baseline table has no timestamp. This is the only information I found in the documentation which is very vague:

.... The exception is the timestamp column for tables used with time series or inference profiles. If columns are missing in either the primary table or the baseline table, monitoring uses best-effort heuristics to compute the output metrics

For example, when I use the model serving endpoint, the timestamp column of my primary table corresponds to the time when a client calls the endpoint to compute the prediction for some query. Now, imagine I want to use my validation dataset as the baseline table. How does Databricks match the rows of the two tables?