by
dk_g
• New Contributor
- 350 Views
- 0 replies
- 0 kudos
Hi,I am using clustered GPU(driver -1GPU and Worker-3GPU), and caching model data into unity catalog but while loading model checkpoint shards its always use driver memory and failed due insufficient memory.How to use complete cluster GPU while loadi...
- 350 Views
- 0 replies
- 0 kudos
- 5002 Views
- 10 replies
- 7 kudos
Hi Community,I’m currently working on a Retrieval-Augmented Generation (RAG) use case in Databricks. I’ve successfully implemented and served a model that uses a single Vector Search index, and everything works as expected.However, when I try to serv...
- 5002 Views
- 10 replies
- 7 kudos
- 5772 Views
- 4 replies
- 0 kudos
Hi everyone.I am working on a graph that utilizes the MemorySaver class to incorporate short-term memory. This will enable me to maintain a multi-turn conversation with the user by storing the chat history.I am using the MLflow "models from code" fea...
- 5772 Views
- 4 replies
- 0 kudos
Latest Reply
Hi @moemedina. No, I didn't.I'm considering using ChatModel/ChatAgent class to wrap the graph and be able to move on. However, the MLflow documentation is still referring to ChatModel where Chat Agent is the latest recommendation:MLflow ChatModel Doc...
3 More Replies