Databricks Community

llmnerd · ‎11-12-2024

Hi there,

is this the correct approach to fulfil the rate limit restrictions in the foundational model API?

from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=2.0, 
    check_every_n_seconds=0.5, 
    max_bucket_size=10
)

chat_model = ChatDatabricks(
            endpoint=model,
            temperature=temperature,
            max_tokens=max_tokens,
            rate_limiter=rate_limiter
        )

Alberto_Umana · ‎11-12-2024

Hello @llmnerd,

Yes, the approach you have outlined to fulfill the rate limit restrictions in the foundational model API using InMemoryRateLimiter from langchain_core appears to be correct. This setup should help you manage the rate limits effectively for your foundational model API. If you have any specific requirements or encounter any issues please let us know.

Databricks Community

DataBricks Foundational model rate limiting approach

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon