Hi everyone,
I've recently deployed a custom model using Databricks Model Serving with AI Gateway-enabled inference tables. The model is built with:
Python 3.11.11
LightGBM 4.5.0
MLflow 2.13.1
I’ve noticed that the inference logs can take up to 1 hour to appear, as mentioned in the Databricks documentation. This is quite different from a previous setup (Python 3.10.12, LightGBM 3.3.5, MLflow 2.5.0) where logs appeared in ~5 minutes using legacy inference tables.
Question:
Is there any way to reduce the latency of inference logs when using AI Gateway-enabled inference tables?
I understand the system is now based on batch delivery, but I’d like to know if:
There are configuration options to speed this up?
There’s any official roadmap to reduce this latency?
Any best practices to implement near real-time logging (e.g., logging predictions manually into a Delta table within the model wrapper)?
Thanks in advance for your help!
Marcelo