topic DataBricks Foundational model rate limiting approach in Generative AI

DataBricks Foundational model rate limiting approach

llmnerd — Tue, 12 Nov 2024 15:51:27 GMT

Hi there,

is this the correct approach to fulfil the rate limit restrictions in the foundational model API?

from langchain_core.rate_limiters import InMemoryRateLimiter rate_limiter = InMemoryRateLimiter( requests_per_second=2.0, check_every_n_seconds=0.5, max_bucket_size=10 ) chat_model = ChatDatabricks( endpoint=model, temperature=temperature, max_tokens=max_tokens, rate_limiter=rate_limiter )

Re: DataBricks Foundational model rate limiting approach

Alberto_Umana — Tue, 12 Nov 2024 22:00:56 GMT

Hello @llmnerd,

Yes, the approach you have outlined to fulfill the rate limit restrictions in the foundational model API using InMemoryRateLimiter from langchain_core appears to be correct. This setup should help you manage the rate limits effectively for your foundational model API. If you have any specific requirements or encounter any issues please let us know.