How to Reduce Log Latency for AI Gateway-Enabled I...

ecram · ‎06-30-2025

Hi everyone,

I've recently deployed a custom model using Databricks Model Serving with AI Gateway-enabled inference tables. The model is built with:

Python 3.11.11
LightGBM 4.5.0
MLflow 2.13.1

I’ve noticed that the inference logs can take up to 1 hour to appear, as mentioned in the Databricks documentation. This is quite different from a previous setup (Python 3.10.12, LightGBM 3.3.5, MLflow 2.5.0) where logs appeared in ~5 minutes using legacy inference tables.

Question:
Is there any way to reduce the latency of inference logs when using AI Gateway-enabled inference tables?

I understand the system is now based on batch delivery, but I’d like to know if:

There are configuration options to speed this up?
There’s any official roadmap to reduce this latency?
Any best practices to implement near real-time logging (e.g., logging predictions manually into a Delta table within the model wrapper)?

Thanks in advance for your help!
Marcelo

How to Reduce Log Latency for AI Gateway-Enabled Inference Tables in Model Serving?