How to Reduce Log Latency for AI Gateway-Enabled Inference Tables in Model Serving?

ecram — Mon, 30 Jun 2025 22:34:09 GMT

Hi everyone,

I've recently deployed a custom model using Databricks Model Serving with AI Gateway-enabled inference tables. The model is built with:

Python 3.11.11
LightGBM 4.5.0
MLflow 2.13.1

I’ve noticed that the inference logs can take up to 1 hour to appear, as mentioned in the Databricks documentation. This is quite different from a previous setup (Python 3.10.12, LightGBM 3.3.5, MLflow 2.5.0) where logs appeared in ~5 minutes using legacy inference tables.

Question:
Is there any way to reduce the latency of inference logs when using AI Gateway-enabled inference tables?

I understand the system is now based on batch delivery, but I’d like to know if:

There are configuration options to speed this up?
There’s any official roadmap to reduce this latency?
Any best practices to implement near real-time logging (e.g., logging predictions manually into a Delta table within the model wrapper)?

Thanks in advance for your help!
Marcelo

Re: How to Reduce Log Latency for AI Gateway-Enabled Inference Tables in Model Serving?

Kumaran — Thu, 21 Aug 2025 19:32:25 GMT

Hi @ecram,

Thank you for contacting Databricks community.

As per the doc below, you'll see the latency for 1 hour for log delivery in the inference table.

https://docs.databricks.com/aws/en/ai-gateway/inference-tables#:~:text=You%20can%20expect%20logs%20to%20be%20available%20within%201%20hour%20of%20a%20request.%20Reach%20out%20to%20your%20Databricks%20account%20team%20for%20more%20information.

topic Re: How to Reduce Log Latency for AI Gateway-Enabled Inference Tables in Model Serving? in Machine Learning

How to Reduce Log Latency for AI Gateway-Enabled Inference Tables in Model Serving?

Re: How to Reduce Log Latency for AI Gateway-Enabled Inference Tables in Model Serving?