Hi Databricks Team / Community,
I’m encountering a 500 Internal Server Error when calling an Agent Bricks MAS endpoint in my workspace. The error message is:
500 Internal Error. Please try again later. If this issue persists, please contact Databricks support.
Context:
- I have deployed a multi-agent supervisor using Agent Bricks and exposed it as a serving endpoint.
- I tried with 1~3 agents and all of them give the same error. Testing the agent endpoints separately works fine.
Troubleshooting I’ve Tried:
- Verified workspace permissions; the token/user has access to all referenced models and tools.
- Checked cluster status; compute resources appear healthy.
- Re-deployed the endpoint to ensure the latest agent version is active.
- Tested with smaller payloads.
I would appreciate guidance on:
- What could cause a 500 Internal Error in Agent Bricks endpoints?
- How to reliably debug or capture detailed logs for such failures.
- Any known limitations or workarounds for multi-agent endpoints causing 500 errors.
Thank you in advance for any help or insights!