- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2025 01:34 PM
From what I understand of reading the documentation the /api/2.0/serving-endpoints/{name}/ai-gateway supports a "tokens" and a "principals" attribute in the JSON payload.
Documentation link: Update AI Gateway of a serving endpoint | Serving endpoints API | REST API reference | Azure Databri...
When I call the API get the following output as part of 200 response. Is this supported or am I making an incorrect call somehow?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2025 09:25 PM
I have dug a bit deeper on this these properties are supported but not as top level request body fields, instead they are available in object element fields under `rate_limits`. The actual payload looks like::
```
{
"guardrails": { /* ... */ },
"inference_table_config": { /* ... */ },
"rate_limits": [
{
"renewal_period": "MINUTE|HOUR|DAY",
"calls": 100,
"tokens": 1000, // ← tokens supported HERE (in rate_limits)
"principal": "user@company.com", // ← principals supported HERE
"key": "USER|ENDPOINT"
}
],
"usage_tracking_config": { /* ... */ },
"fallback_config": { /* ... */ }
}
```
For example to update the config for an ai-gateway resource you would use:
```
curl -X PUT \
"https://<deployment url>/api/2.0/serving-endpoints/{name}/ai-gateway" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"rate_limits": [
{
"renewal_period": "HOUR",
"calls": 100,
"tokens": 1000,
"principal": "user@company.com"
}
]
}'
```
Let me know how this goes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2025 07:14 AM
Thank you for the help, it looks like from the result of the curl command the it has to be either calls or tokens you can't have both in the rate_limits. Thank you for your help! (I think the docs are wrong or I misinterpret the reading of them and thought you could pass both at once)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-29-2025 08:40 AM - edited 10-29-2025 08:48 AM
Here is the code that works based on the above
curl -X PUT \
"${DATABRICKS_HOST}/api/2.0/serving-endpoints/databricks-claude-opus-4-1/ai-gateway" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"rate_limits": [
{
"key":"user",
"renewal_period": "minute",
"tokens": 99999
},
{
"key":"user",
"renewal_period": "minute",
"calls": 9
}
],
"usage_tracking_config": { "enabled": true }
}'
}To get principal to work the call should look like this based on my experimentation:
curl -X PUT \
"${DATABRICKS_HOST}/api/2.0/serving-endpoints/databricks-claude-opus-4-1/ai-gateway" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"rate_limits": [
{
"key":"user",
"principal":"sfibich1@xyz.com",
"renewal_period": "minute",
"tokens": 99999
},
{
"key":"user",
"renewal_period": "minute",
"calls": 9
}
],
"usage_tracking_config": { "enabled": true }
}'
}