cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

API call to /api/2.0/serving-endpoints/{name}/ai-gateway does not support tokens or principals

sfibich1
New Contributor

From what I understand of reading the documentation the /api/2.0/serving-endpoints/{name}/ai-gateway supports a "tokens" and a "principals" attribute in the JSON payload.

Documentation link: Update AI Gateway of a serving endpoint | Serving endpoints API | REST API reference | Azure Databri...

When I call the API get the following output as part of 200 response.  Is this supported or am I making an incorrect call somehow? 

sfibich1_0-1761683609470.png

 

1 ACCEPTED SOLUTION

Accepted Solutions

jeffreyaven
Databricks Employee
Databricks Employee

I have dug a bit deeper on this these properties are supported but not as top level request body fields, instead they are available in object element fields under `rate_limits`. The actual payload looks like::

```
{
    "guardrails": { /* ... */ },
    "inference_table_config": { /* ... */ },
    "rate_limits": [
      {
        "renewal_period": "MINUTE|HOUR|DAY",
        "calls": 100,
        "tokens": 1000,           // โ† tokens supported HERE (in rate_limits)
        "principal": "user@company.com", // โ† principals supported HERE  
        "key": "USER|ENDPOINT"
      }
    ],
    "usage_tracking_config": { /* ... */ },
    "fallback_config": { /* ... */ }
  }
```

For example to update the config for an ai-gateway resource you would use:

```
curl -X PUT \
  "https://<deployment url>/api/2.0/serving-endpoints/{name}/ai-gateway" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limits": [
      {
        "renewal_period": "HOUR",
        "calls": 100,
        "tokens": 1000,
        "principal": "user@company.com"
      }
    ]
  }'
```

Let me know how this goes

View solution in original post

3 REPLIES 3

jeffreyaven
Databricks Employee
Databricks Employee

I have dug a bit deeper on this these properties are supported but not as top level request body fields, instead they are available in object element fields under `rate_limits`. The actual payload looks like::

```
{
    "guardrails": { /* ... */ },
    "inference_table_config": { /* ... */ },
    "rate_limits": [
      {
        "renewal_period": "MINUTE|HOUR|DAY",
        "calls": 100,
        "tokens": 1000,           // โ† tokens supported HERE (in rate_limits)
        "principal": "user@company.com", // โ† principals supported HERE  
        "key": "USER|ENDPOINT"
      }
    ],
    "usage_tracking_config": { /* ... */ },
    "fallback_config": { /* ... */ }
  }
```

For example to update the config for an ai-gateway resource you would use:

```
curl -X PUT \
  "https://<deployment url>/api/2.0/serving-endpoints/{name}/ai-gateway" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "rate_limits": [
      {
        "renewal_period": "HOUR",
        "calls": 100,
        "tokens": 1000,
        "principal": "user@company.com"
      }
    ]
  }'
```

Let me know how this goes

Thank you for the help, it looks like from the result of the curl command the it has to be either calls or tokens you can't have both in the rate_limits.  Thank you for your help!  (I think the docs are wrong or I misinterpret the reading of them and thought you could pass both at once)

Here is the code that works based on the above

curl -X PUT \
                "${DATABRICKS_HOST}/api/2.0/serving-endpoints/databricks-claude-opus-4-1/ai-gateway" \
 -H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
 -H "Content-Type: application/json" \
 -d '{
            "rate_limits": [
            {
                        "key":"user",
                        "renewal_period": "minute",
                "tokens": 99999
                },
            {
                        "key":"user",
                        "renewal_period": "minute",
                "calls": 9
                }
                ],
                "usage_tracking_config": { "enabled": true }
         }'
}

To get principal to work the call should look like this based on my experimentation:

curl -X PUT \
                "${DATABRICKS_HOST}/api/2.0/serving-endpoints/databricks-claude-opus-4-1/ai-gateway" \
 -H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
 -H "Content-Type: application/json" \
 -d '{
            "rate_limits": [
            {
                        "key":"user",
                        "principal":"sfibich1@xyz.com",
                        "renewal_period": "minute",
                "tokens": 99999
                },
            {
                        "key":"user",
                        "renewal_period": "minute",
                "calls": 9
                }
                ],
                "usage_tracking_config": { "enabled": true }
         }'
}