DataBricks Foundational model rate limiting approach

llmnerd
New Contributor

Hi there,

is this the correct approach to fulfil the rate limit restrictions in the foundational model API?

from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=2.0, 
    check_every_n_seconds=0.5, 
    max_bucket_size=10
)

chat_model = ChatDatabricks(
            endpoint=model,
            temperature=temperature,
            max_tokens=max_tokens,
            rate_limiter=rate_limiter
        )

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @llmnerd,

Yes, the approach you have outlined to fulfill the rate limit restrictions in the foundational model API using InMemoryRateLimiter from langchain_core appears to be correct. This setup should help you manage the rate limits effectively for your foundational model API. If you have any specific requirements or encounter any issues please let us know.