3 weeks ago
From what I understand of reading the documentation the /api/2.0/serving-endpoints/{name}/ai-gateway supports a "tokens" and a "principals" attribute in the JSON payload.
Documentation link: Update AI Gateway of a serving endpoint | Serving endpoints API | REST API reference | Azure Databri...
When I call the API get the following output as part of 200 response. Is this supported or am I making an incorrect call somehow?
3 weeks ago
I have dug a bit deeper on this these properties are supported but not as top level request body fields, instead they are available in object element fields under `rate_limits`. The actual payload looks like::
```
{
"guardrails": { /* ... */ },
"inference_table_config": { /* ... */ },
"rate_limits": [
{
"renewal_period": "MINUTE|HOUR|DAY",
"calls": 100,
"tokens": 1000, // โ tokens supported HERE (in rate_limits)
"principal": "user@company.com", // โ principals supported HERE
"key": "USER|ENDPOINT"
}
],
"usage_tracking_config": { /* ... */ },
"fallback_config": { /* ... */ }
}
```
For example to update the config for an ai-gateway resource you would use:
```
curl -X PUT \
"https://<deployment url>/api/2.0/serving-endpoints/{name}/ai-gateway" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"rate_limits": [
{
"renewal_period": "HOUR",
"calls": 100,
"tokens": 1000,
"principal": "user@company.com"
}
]
}'
```
Let me know how this goes
3 weeks ago
I have dug a bit deeper on this these properties are supported but not as top level request body fields, instead they are available in object element fields under `rate_limits`. The actual payload looks like::
```
{
"guardrails": { /* ... */ },
"inference_table_config": { /* ... */ },
"rate_limits": [
{
"renewal_period": "MINUTE|HOUR|DAY",
"calls": 100,
"tokens": 1000, // โ tokens supported HERE (in rate_limits)
"principal": "user@company.com", // โ principals supported HERE
"key": "USER|ENDPOINT"
}
],
"usage_tracking_config": { /* ... */ },
"fallback_config": { /* ... */ }
}
```
For example to update the config for an ai-gateway resource you would use:
```
curl -X PUT \
"https://<deployment url>/api/2.0/serving-endpoints/{name}/ai-gateway" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"rate_limits": [
{
"renewal_period": "HOUR",
"calls": 100,
"tokens": 1000,
"principal": "user@company.com"
}
]
}'
```
Let me know how this goes
3 weeks ago
Thank you for the help, it looks like from the result of the curl command the it has to be either calls or tokens you can't have both in the rate_limits. Thank you for your help! (I think the docs are wrong or I misinterpret the reading of them and thought you could pass both at once)
3 weeks ago - last edited 3 weeks ago
Here is the code that works based on the above
curl -X PUT \
"${DATABRICKS_HOST}/api/2.0/serving-endpoints/databricks-claude-opus-4-1/ai-gateway" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"rate_limits": [
{
"key":"user",
"renewal_period": "minute",
"tokens": 99999
},
{
"key":"user",
"renewal_period": "minute",
"calls": 9
}
],
"usage_tracking_config": { "enabled": true }
}'
}To get principal to work the call should look like this based on my experimentation:
curl -X PUT \
"${DATABRICKS_HOST}/api/2.0/serving-endpoints/databricks-claude-opus-4-1/ai-gateway" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"rate_limits": [
{
"key":"user",
"principal":"sfibich1@xyz.com",
"renewal_period": "minute",
"tokens": 99999
},
{
"key":"user",
"renewal_period": "minute",
"calls": 9
}
],
"usage_tracking_config": { "enabled": true }
}'
}
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now