<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ai_query not affected by AI gateway's rate limits? in Generative AI</title>
    <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134302#M1199</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136433"&gt;@PiotrM&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I guess that's the whole problem here.&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136433"&gt;@PiotrM&lt;/a&gt;&amp;nbsp;correctly identified&amp;nbsp;and configured tool to achieve his goal - AI Gateway.&lt;BR /&gt;My guess is that the ai_gateway function internally uses some shortcut to communicate with the endpoint. That could explain why the rate limit works when you call the endpoint directly, but doesn’t when you use ai_gateway.&lt;/P&gt;</description>
    <pubDate>Thu, 09 Oct 2025 06:02:50 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-10-09T06:02:50Z</dc:date>
    <item>
      <title>ai_query not affected by AI gateway's rate limits?</title>
      <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134257#M1196</link>
      <description>&lt;P&gt;Hey,&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;We've been testing the ai_query (Azure Databricks here) on preconfigured model serving endpoints like&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;databricks-meta-llama-3-3-70b-instruct&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;and the initial results look nice.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I'm trying to limit the number of requests that could be sent to those endpoints, so the cloud spend won't spiral out of control. &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The AI gateway seems to have the capability to limit the tokens/queries per minute which would be exactly what we're looking for, but it seems to not affect the ai_query functions calling the endpoint, despite successfully limiting the requests from Rest API?.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Is it the intended behavior? If so, are there any other options to properly limit the usage of ai_query apart from being able to monitor it using system tables/logs?&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Best regards,&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Piotr&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Oct 2025 17:58:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134257#M1196</guid>
      <dc:creator>PiotrM</dc:creator>
      <dc:date>2025-10-08T17:58:40Z</dc:date>
    </item>
    <item>
      <title>Re: ai_query not affected by AI gateway's rate limits?</title>
      <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134280#M1197</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136433"&gt;@PiotrM&lt;/a&gt;,&lt;BR /&gt;&lt;BR /&gt;Firstly, have you checked the docs out for Managing Model Serving Endpoints?&amp;nbsp;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/manage-serving-endpoints" target="_blank"&gt;https://learn.microsoft.com/en-us/azure/databricks/machine-learning/model-serving/manage-serving-endpoints&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I just had a read through. You can certainly set up budgets to monitor them, this can help with preventing costs spiralling! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;. I appreciate you've mentioned about the system tables.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;This article seems &lt;STRONG&gt;really&amp;nbsp;&lt;/STRONG&gt;promising:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ai-gateway/configure-ai-gateway-endpoints" target="_blank"&gt;https://docs.databricks.com/aws/en/ai-gateway/configure-ai-gateway-endpoints&lt;/A&gt;&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":eyes:"&gt;👀&lt;/span&gt;&lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;... (I'm certain we've got to be onto a winner with this)&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="BS_THE_ANALYST_1-1759953536854.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20585i4A3F3EBF6F39F3CF/image-size/medium?v=v2&amp;amp;px=400" role="button" title="BS_THE_ANALYST_1-1759953536854.png" alt="BS_THE_ANALYST_1-1759953536854.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="BS_THE_ANALYST_0-1759953226614.png" style="width: 999px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20584i2CEE7861DDF0F8AF/image-size/large?v=v2&amp;amp;px=999" role="button" title="BS_THE_ANALYST_0-1759953226614.png" alt="BS_THE_ANALYST_0-1759953226614.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;If that doesn't quite cut the mustard, perhaps we could also look at the actual token usage per user. Perhaps this can be throttled somehow &lt;span class="lia-unicode-emoji" title=":thinking_face:"&gt;🤔&lt;/span&gt;.&lt;BR /&gt;&lt;BR /&gt;All the best,&lt;BR /&gt;BS&lt;/P&gt;</description>
      <pubDate>Wed, 08 Oct 2025 19:59:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134280#M1197</guid>
      <dc:creator>BS_THE_ANALYST</dc:creator>
      <dc:date>2025-10-08T19:59:26Z</dc:date>
    </item>
    <item>
      <title>Re: ai_query not affected by AI gateway's rate limits?</title>
      <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134302#M1199</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136433"&gt;@PiotrM&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I guess that's the whole problem here.&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136433"&gt;@PiotrM&lt;/a&gt;&amp;nbsp;correctly identified&amp;nbsp;and configured tool to achieve his goal - AI Gateway.&lt;BR /&gt;My guess is that the ai_gateway function internally uses some shortcut to communicate with the endpoint. That could explain why the rate limit works when you call the endpoint directly, but doesn’t when you use ai_gateway.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Oct 2025 06:02:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134302#M1199</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-10-09T06:02:50Z</dc:date>
    </item>
    <item>
      <title>Re: ai_query not affected by AI gateway's rate limits?</title>
      <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134310#M1200</link>
      <description>&lt;P&gt;Hey,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;, before writing that post, I went exactly through the docs you've posted. I wasn't able to find a specific confirmation (or denial) that this function will be affected by the rate limits, which led me to believe that it's worth a shot.&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/110502"&gt;@szymon_dybczak&lt;/a&gt;&amp;nbsp;Thank you. My guess exactly. On Azure it's still in Public Preview so maybe it'll be added in the future.&amp;nbsp;&lt;/P&gt;&lt;P&gt;BR,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Piotr&lt;/P&gt;</description>
      <pubDate>Thu, 09 Oct 2025 06:57:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134310#M1200</guid>
      <dc:creator>PiotrM</dc:creator>
      <dc:date>2025-10-09T06:57:15Z</dc:date>
    </item>
    <item>
      <title>Re: ai_query not affected by AI gateway's rate limits?</title>
      <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134312#M1201</link>
      <description>&lt;P&gt;Yep, let's wait for a Databricks employee to join the discussion. Maybe they will shed some light on why it's not working as expected. You did everything correctly on your side. If the endpoint accessed via ai_query is not subject to the API rate limit, it should be clearly stated in the documentation.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Oct 2025 07:04:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134312#M1201</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-10-09T07:04:09Z</dc:date>
    </item>
    <item>
      <title>Re: ai_query not affected by AI gateway's rate limits?</title>
      <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134427#M1202</link>
      <description>&lt;P&gt;Hey guys,&lt;/P&gt;
&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/136433"&gt;@PiotrM&lt;/a&gt;&amp;nbsp;AI Gateway does not currently enforce rate limiting on ai_query batch inference workloads, it only provides usage tracking, which is called out in the &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/ai-gateway/#:~:text=Only%20usage%20tracking%20is%20supported%20for%20batch%20inference%20workloads%20on%20pay%2Dper%2Dtoken%20endpoints%20that%20have%20AI%20Gateway%20features%20enabled.%20In%20the%20endpoint_usage%20system%20table%20only%20the%20rows%20corresponding%20to%20the%20batch%20inference%20request%20are%20visible." target="_blank" rel="noopener"&gt;docs on limitations&lt;/A&gt;.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For cost control, you could &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/ai_query#:~:text=The%20name%20of%20a%20Databricks%20Foundation%20Model%20serving%20endpoint%2C%20an%20external%20model%20serving%20endpoint%20or%20a%20custom%20model%20endpoint%20in%20the%20same%20workspace%20for%20invocations%20as%20a%20STRING%20literal.%20The%20definer%20must%20have%20CAN%20QUERY%20permission%20on%20the%20endpoint." target="_blank" rel="noopener"&gt;control permissions on the endpoint&lt;/A&gt; and/or&amp;nbsp;do &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/" target="_blank" rel="noopener"&gt;system table monitoring&lt;/A&gt; or &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/user/alerts/" target="_blank" rel="noopener"&gt;sql alerts&lt;/A&gt; with something like:&amp;nbsp;&lt;BR /&gt;```&lt;BR /&gt;SELECT&lt;BR /&gt;user_id,&lt;BR /&gt;endpoint_name,&lt;BR /&gt;SUM(num_tokens) AS total_tokens,&lt;BR /&gt;COUNT(*) AS total_requests,&lt;BR /&gt;MIN(request_time) AS first_request,&lt;BR /&gt;MAX(request_time) AS last_request&lt;BR /&gt;FROM system.serving.endpoint_usage&lt;BR /&gt;WHERE endpoint_name = '&amp;lt;your_endpoint_name&amp;gt;'&lt;BR /&gt;AND request_time &amp;gt;= CURRENT_DATE() -- adjust time window as needed&lt;BR /&gt;GROUP BY user_id, endpoint_name&lt;BR /&gt;ORDER BY total_tokens DESC;&lt;BR /&gt;```&lt;/P&gt;
&lt;P&gt;I hope this helps. If this and the other replies resolve the issue for you, please use the "Accept as Solution" button to let us know!&lt;/P&gt;
&lt;P&gt;-James&lt;/P&gt;</description>
      <pubDate>Thu, 09 Oct 2025 15:53:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134427#M1202</guid>
      <dc:creator>jamesl</dc:creator>
      <dc:date>2025-10-09T15:53:05Z</dc:date>
    </item>
    <item>
      <title>Re: ai_query not affected by AI gateway's rate limits?</title>
      <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134428#M1203</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/127181"&gt;@jamesl&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Thanks for clarifying our doubts, that's exactly what we were looking for. Maybe it's a good idea to add small addition to&amp;nbsp;AI Gateway documentation?&lt;/P&gt;</description>
      <pubDate>Thu, 09 Oct 2025 16:33:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134428#M1203</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-10-09T16:33:30Z</dc:date>
    </item>
    <item>
      <title>Re: ai_query not affected by AI gateway's rate limits?</title>
      <link>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134461#M1204</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/127181"&gt;@jamesl&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;thank you very much. This resolves my question. This specific sentence in the AI Gateway docs may have gone over my head, but it's clear now.&lt;/P&gt;&lt;P&gt;BR,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Piotr&lt;/P&gt;</description>
      <pubDate>Thu, 09 Oct 2025 19:48:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/ai-query-not-affected-by-ai-gateway-s-rate-limits/m-p/134461#M1204</guid>
      <dc:creator>PiotrM</dc:creator>
      <dc:date>2025-10-09T19:48:52Z</dc:date>
    </item>
  </channel>
</rss>

