ai_query not affected by AI gateway's rate limits?

PiotrM · ‎10-08-2025

Hey,

We've been testing the ai_query (Azure Databricks here) on preconfigured model serving endpoints like

databricks-meta-llama-3-3-70b-instruct and the initial results look nice.

I'm trying to limit the number of requests that could be sent to those endpoints, so the cloud spend won't spiral out of control.

The AI gateway seems to have the capability to limit the tokens/queries per minute which would be exactly what we're looking for, but it seems to not affect the ai_query functions calling the endpoint, despite successfully limiting the requests from Rest API?.

Is it the intended behavior? If so, are there any other options to properly limit the usage of ai_query apart from being able to monitor it using system tables/logs?

Best regards,

Piotr