@Suheb , You may look at the torch distributor. It provides multiple distributed training options, including single-node with multiple-GPU training and multi-node training. Below are the references for you.
https://docs.databricks.com/aws/en/machine-...
@Suheb, Depends on your usecase.
However, if it fits, I would recommend that you start with a multi-agent supervisor if you have the agents from the list below
An existing Agent Bricks: Knowledge Assistant(/generative-ai/agent-bricks/knowledge-assist...
@Suheb , We have a bunch of resources around Databricks RAG. Did you have a chance to look at the available documentation? Adding a few of them below.
Databricks docs for RAG: https://docs.databricks.com/aws/en/generative-ai/retrieval-augmented-gener...
Hi @normk-sd , Not sure what your exact requirements are, but we have a bunch of new notebooks around MCP and multi-agent systems.
LangGraph/OpenAI MCP tool-calling agent:
https://docs.databricks.com/aws/en/generative-ai/mcp/managed-mcp
Multi-agent s...
@Rajat-TVSM , The error usually suggests incompatibility between your request payload and what the model expects.
However, if you have verified that the same payload no longer reproduces the issue, it will be challenging to identify the root cause. Y...