Hi! We are using Lakehouse monitoring for detecting data drift in our metrics. However, the exact calculation of metrics is not documented anywhere (I couldnt find it) and it raises questions on how they are done, in our case especially - PSI.
I would like to ask following questions (descending priority order):
1. Is it somewhere to find the documentation regarding the implementation of PSI and other metrics?
2. We have a case, where for two different metrics (F1 and recall accordingly) avg_delta and wasserstein_distance are equal to ~0.01, but PSI for one metric is 0.02, and for the other one is ~2.2. I understand, that its possible due to the binning, but it would be much more insightful, if we could see the algorithm and see why it happens.
3. We have a test case, where we compare two same distributions/arrays. For two diffent metrics (F1 and recall accordingly), the avg_delta is 2.0E-16 and 0. Wasserstein is 0 for both cases, PSI for both cases is 0.041. We can only assume, that the non-zero values, are emerged due to the rounding error, but seeing the underlying algorithm would benefit us a lot.
Thanks in advance!