How to Reduce Log Latency for AI Gateway-Enabled Inference Tables in Model Serving?

ecram
New Contributor

Hi everyone,

I've recently deployed a custom model using Databricks Model Serving with AI Gateway-enabled inference tables. The model is built with:

  • Python 3.11.11

  • LightGBM 4.5.0

  • MLflow 2.13.1

I’ve noticed that the inference logs can take up to 1 hour to appear, as mentioned in the Databricks documentation. This is quite different from a previous setup (Python 3.10.12, LightGBM 3.3.5, MLflow 2.5.0) where logs appeared in ~5 minutes using legacy inference tables.

Question:
Is there any way to reduce the latency of inference logs when using AI Gateway-enabled inference tables?

I understand the system is now based on batch delivery, but I’d like to know if:

  • There are configuration options to speed this up?

  • There’s any official roadmap to reduce this latency?

  • Any best practices to implement near real-time logging (e.g., logging predictions manually into a Delta table within the model wrapper)?

Thanks in advance for your help!
Marcelo