One of the most valuable capabilities in Agent Bricks is the Experiments tab, where Databricks automatically evaluates different agent configurations and optimization runs using LLM-as-a-Judge techniques. Instead of relying only on traditional metrics such as accuracy or loss, Agent Bricks uses AI judges to assess the quality of generated responses against task-specific evaluation criteria.
You can visit the video link here:
https://drive.google.com/file/d/1JIXTYbhsaikth8iRJw32l0M-CBSEx50k/view?usp=sharing