Agents and Chains: How to Log, Deploy, and Debug AGENTS with Databricks (With demo!)

lara_rachidi — Tue, 30 Jul 2024 15:44:45 GMT

In this video, we deep dive into the world of agents and chains on Databricks! We cover all the newest Databricks tools for building compound systems and agents with better evaluation and feedback. We look at the three main features associated with this new framework: 00:26 How to create and log agents 08:47 How to deploy agents 11:55 Agent tracing 14:02 Deep dive into Mosaic AI Agent Evaluation 17:57 End-to-end demo! Using RAG (Retrieval-Augmented Generation) as a basic agent system, we show how to modularize components and configure them for flexibility. 00:26 First, we show you how to create your agent, how to parameterize your agents during dev, and how to iterate. We explain the difference between chains and agents and how to convert a chain to an agent. Chains, such as those used in LangChain, are often hard-coded and lack flexibility. In contrast, agents offer a modular and configurable approach, allowing large language models (LLMs) to make decisions dynamically. We explore how to log a model using the langchain flavor of mlflow (also possible with pyfunc flavor for more flexibility) and how to register your chain to UC. 08:47 Second, we show you how to deploy agents using Model Serving or the deploy() function (which lets you enable the Review app for your agent so that business stakeholders can provide feedback!), and how to leverage the system tables logged during this process (payload, request logs, and assessment logs from the review app). 11:55 Finally, we show you how to evaluate your agent and trace your agent, with metrics such as response metrics, retrieval metrics, and performance metrics, and how to kick off an evaluation run using the new MLflow evaluation API. We also explore the new integration with LLM as a judge. 14:02 Deep dive into Mosaic AI Agent Evaluation: It is an AI-assisted evaluation tool that automatically determines if outputs are high-quality and provides an intuitive UI to get feedback from human stakeholders. Agent Evaluation lets you define what high-quality answers look like for your AI system by providing “golden” examples of successful interactions. You can explore permutations of the system, tuning models, changing retrieval, or adding tools, and understand how system changes alter quality. Agent Evaluation also lets you invite subject matter experts across your organization - even those without Databricks accounts - to review and label your AI system output to do production quality assessments and build up an extended evaluation dataset. Finally, system-provided LLM judges can further scale the collection of evaluation data by grading responses on common criteria such as accuracy or helpfulness. Detailed production traces can help diagnose low-quality responses. 17:57 End-to-end demo!! 🙂 If you want to know more about compound systems, check our video on how to optimize your LLM Pipelines with DSPy https://www.youtube.com/watch?v=ChaS0MkYPmE

article Agents and Chains: How to Log, Deploy, and Debug AGENTS with Databricks (With demo!) in Databricks TV

Agents and Chains: How to Log, Deploy, and Debug AGENTS with Databricks (With demo!)