Wednesday
Hey,
We've been testing the ai_query (Azure Databricks here) on preconfigured model serving endpoints like
Thursday
Hey guys,
@PiotrM AI Gateway does not currently enforce rate limiting on ai_query batch inference workloads, it only provides usage tracking, which is called out in the docs on limitations.
For cost control, you could control permissions on the endpoint and/or do system table monitoring or sql alerts with something like:
```
SELECT
user_id,
endpoint_name,
SUM(num_tokens) AS total_tokens,
COUNT(*) AS total_requests,
MIN(request_time) AS first_request,
MAX(request_time) AS last_request
FROM system.serving.endpoint_usage
WHERE endpoint_name = '<your_endpoint_name>'
AND request_time >= CURRENT_DATE() -- adjust time window as needed
GROUP BY user_id, endpoint_name
ORDER BY total_tokens DESC;
```
I hope this helps. If this and the other replies resolve the issue for you, please use the "Accept as Solution" button to let us know!
-James
Wednesday
Hey @PiotrM,
Firstly, have you checked the docs out for Managing Model Serving Endpoints?
https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/manage-serving-end...
I just had a read through. You can certainly set up budgets to monitor them, this can help with preventing costs spiralling! 🙂. I appreciate you've mentioned about the system tables.
This article seems really promising: https://docs.databricks.com/aws/en/ai-gateway/configure-ai-gateway-endpoints 👀🙂... (I'm certain we've got to be onto a winner with this)
If that doesn't quite cut the mustard, perhaps we could also look at the actual token usage per user. Perhaps this can be throttled somehow 🤔.
All the best,
BS
Wednesday
Hi @PiotrM , @BS_THE_ANALYST ,
I guess that's the whole problem here. @PiotrM correctly identified and configured tool to achieve his goal - AI Gateway.
My guess is that the ai_gateway function internally uses some shortcut to communicate with the endpoint. That could explain why the rate limit works when you call the endpoint directly, but doesn’t when you use ai_gateway.
Wednesday
Hey,
@BS_THE_ANALYST, before writing that post, I went exactly through the docs you've posted. I wasn't able to find a specific confirmation (or denial) that this function will be affected by the rate limits, which led me to believe that it's worth a shot.
@szymon_dybczak Thank you. My guess exactly. On Azure it's still in Public Preview so maybe it'll be added in the future.
BR,
Piotr
Thursday
Yep, let's wait for a Databricks employee to join the discussion. Maybe they will shed some light on why it's not working as expected. You did everything correctly on your side. If the endpoint accessed via ai_query is not subject to the API rate limit, it should be clearly stated in the documentation.
Thursday
Hey guys,
@PiotrM AI Gateway does not currently enforce rate limiting on ai_query batch inference workloads, it only provides usage tracking, which is called out in the docs on limitations.
For cost control, you could control permissions on the endpoint and/or do system table monitoring or sql alerts with something like:
```
SELECT
user_id,
endpoint_name,
SUM(num_tokens) AS total_tokens,
COUNT(*) AS total_requests,
MIN(request_time) AS first_request,
MAX(request_time) AS last_request
FROM system.serving.endpoint_usage
WHERE endpoint_name = '<your_endpoint_name>'
AND request_time >= CURRENT_DATE() -- adjust time window as needed
GROUP BY user_id, endpoint_name
ORDER BY total_tokens DESC;
```
I hope this helps. If this and the other replies resolve the issue for you, please use the "Accept as Solution" button to let us know!
-James
Thursday - last edited Thursday
Hi @jamesl ,
Thanks for clarifying our doubts, that's exactly what we were looking for. Maybe it's a good idea to add small addition to AI Gateway documentation?
Thursday
Hi @jamesl,
thank you very much. This resolves my question. This specific sentence in the AI Gateway docs may have gone over my head, but it's clear now.
BR,
Piotr
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now