Large language models are revolutionizing how we interact with technology by leveraging advanced natural language processing to perform complex tasks. In recent years, we have seen state-of-the-art LLM models enabling a wide range of innovative applications. Last year marked a shift toward RAG (Retrieval Augment generation), where users created interactive AI Chatbots by feeding LLMs with their organisational data (through vector embedding).
But we're just scratching the surface. While powerful, “Retrieval Augment Generation” limits our application to static knowledge retrieval. Imagine a typical customer service agent who not only answers questions from internal data but also takes action with minimal human intervention. With LLMs, we can create fully autonomous decision-making applications that don't just respond but also act on user queries. The possibilities are endless - from internal data analysis to web searches and beyond.
The semantic understanding and linguistic capability of Large Language Models enable us to create fully autonomous decision-making applications that can not only answer but also “act” based on users' queries.
Databricks Mosaic AI Agent Framework:
Databricks launched Mosaic AI Agent framework that enables developers to build a production scale agent framework through any LLM. One of the core capabilities is to create tools on Databricks that are designed to help build, deploy, and evaluate production-quality AI agents like Retrieval Augmented Generation (RAG) applications and much more. Developers can create and log agents using any library and integrate them with MLFlow. They can parameterize agents to experiment and iterate on development quickly. Agent tracing lets developers log, analyze, and compare traces to debug and understand how the agent responds to requests.
In this first part of the blog, we will explore agents, and their core components and build an autonomous multi-turn customer service AI agent for an online retail company with one of the best-performing Databricks Foundational model (open source) on the Platform. In the next series of the blog, we will explore the multi-agent framework and build an advanced multi-step reasoning multi-agent for the same business application.
What is an LLM Agent?
LLM agents are next-generation advanced AI systems designed for executing complex tasks that need reasoning. They can think ahead, remember past conversations, and use various tools to adjust their responses based on the situation and style needed.
A natural progression of RAG, LLM Agents are an approach where state-of-the-art large language models are empowered with external systems/tools to make autonomous decisions. In a compound AI system, an agent can be considered a decision engine that is empowered with memory, introspection capability, tool use, and many more. Think of them as super-smart decision engines that can learn, reason, and act independently - the ultimate goal of creating a truly autonomous AI application.
Core Components:
Key components of an agentic application include:
Central Agent:
The core of the framework is a pretrained general-purpose large language model that can process and understand data. These are generally high-performing pretrained models; you start by providing specific prompts. The prompt includes crucial information that guides the agent on how to respond, what tool to use and the goals it should aim for during interaction.
You can also customise the agent by assigning it a specific persona, tailoring its characteristics and expertise for tasks or interactions, making it better suited for the situation. In essence, an LLM agent combines advanced processing with customisable features to effectively adapt to diverse tasks and interactions.
Memory:
Memory is an important component of an agentic architecture. It is temporary storage which the agent uses for storing conversations. This can either be a short-term working memory where the LLM agent is holding current information with immediate context and clears the memory out once the task is completed. This is temporary.
On the other hand, we have long-term memory(sometimes called episodic memory) which holds long-running conversations and it can help the agent to understand patterns, learn from previous tasks and recall the information to make better decisions in future interactions. This conversation generally is persisted in an external database. (e.g. - vector database).
The combination of these two memories allows an agent to provide tailored responses and work better based on user preference over time. Remember, do not confuse agent memory with our LLM’s conversational memory. Both serve different purposes.
Planner:
The next component of an LLM agent is the planning capability, which helps break down complex tasks into manageable tasks and executes each task. While formulating the plan, the planner can utilize multiple reasoning techniques, such as chain-of-thought reasoning or hierarchical reasoning, like decision trees, to decide which path to proceed.
Once the plan is created, agents review and assess its effectiveness through various internal feedback mechanisms. Some common methods include ReAct and Reflexion. These methods help LLM solve complex tasks by cycling through a sequence of thoughts and observing the outcomes. The process repeats itself for iterative improvement.
In a typical multi-turn chatbot with a single LLM agent, the planning and orchestration are done by a single Language model, whereas in a multi-agent framework, separate agents might perform specific tasks like routing, planning, etc.
Tools:
Tools are the building blocks of agents, they perform different tasks as guided by the central core agent. Tools can be various task executors in any form (API calls, python or SQL functions, web search, coding or anything else you want the tool to function. With the integration of tools, an LLM agent performs specific tasks via workflows, gathering observations and collecting information needed to complete subtasks.
When we are building these applications, one thing to consider is how lengthy the interaction is going. You can easily exhaust the context limit of LLMs when the interaction is long-running and potential to forget the older conversations. During a long conversation with a user, the control flow of decision can be single-threaded, multi-threaded in parallel or in a loop. The more complex the decision chain becomes, the more complex its implementation will be.
In Figure 1 below, a single high-performing LLM is the key to decision-making. Based on the user's question, it understands which path it needs to take to route the decision flow. It can utilize multiple tools to perform certain actions, store interim results in memory, perform subsequent planning and finally return the result to the user.
Conversational Agent for Online Retail:
For the purpose of the blog, we are going to create an autonomous customer service AI assistant for an online electronic retailer via Mosaic AI Agent Framework. This assistant will interact with customers, answer their questions, and perform actions based on user instructions. We can introduce a human-in-loop to verify the application's response. We would use Mosaic AI’s tools functionality to create and register our tools inside Unity Catalog. Below is the entity relationship (synthetic data) we built for the blog.
Below is the simple process flow diagram for our use case.
The Mosaic AI Agent framework supports creating tools and registering them into Unity Catalog. UC allows you to govern & manage access to your tools. UC tools run in a secure and isolated environment within the Databricks environment.
Let’s now dive deep into the implementation and see how we have implemented it in Databricks. The individual tools are the building blocks of the framework. We can write any arbitrary Python or SQL function to perform certain tasks before registering it as a UC function(tool).
Code snippet: (SQL) Order Details
The below code returns order details based on a user-provided order ID. Note the description of the input field and comment field of the function. Do not skip function and parameter comments, which are critical for LLMs to call functions/tools properly.
Comments are utilised as metadata parameters by our central LLM to decide which function to execute given a user query. Incorrect or insufficient comments can potentially expose the LLM to execute incorrect functions/tools.
|
Code snippet: (SQL) Shipment Details
This function returns shipment details from the shipment table given an ID. Similar to the above, the comments and details of the metadata are important for the agent to interact with the tool.
|
Code snippet: (Python)
Similarly, you can create any Python function and use it as a tool or function. It can be registered inside unity catalog in a similar manner and provide you with all the benefits mentioned above. The below example is of the web search tool we have built and used it as an endpoint for our agent to call.
|
For our use case, we have created multiple tools performing varied tasks like below:
return_order_details |
Return order details given an Order ID |
return_shipment_details |
Return shipment details provided a Shipment ID |
return_product_details |
Return product details given a product ID |
return_product_review_details |
Return review summary from unstructured data |
search_tool |
Searches web-based on keywords and returns results |
process_order |
Process a refund request based on a user query |
Unity Catalog UCFunctionToolkit :
We will use LangChain orchestrator to build our Chain framework in combination with Databricks UCFunctionToolkit and foundational API models. You can use any orchestrator framework to build your agents. We need UCFunctionToolkit if we want to build our agent with our UC functions(tools).
|
Creating the Agent:
Now that our tools are ready, we will integrate them with a large language model hosted on Databricks, note you can also use external models, etc via the AI Gateway. For the purpose of this blog, we will use databricks-meta-llama-3-1-70b-instruct hosted on Databricks.
This is a recent open-source model that has been trained to use tools effectively. Note that not all models are equivalent, and different models will have different tool usage capabilities.
|
Now that our LLM is ready, we would use LangChain Agent to stitch all these together and build an agent:
|
Let’s see how this looks in action with a sample question:
As a customer, I am asking the agent the price of a particular product, “Breville Electrical Kettle,” in their company and in the market to see competitive offerings.
Based on the question, the agent understood to execute two functions/tools :
Finally, with the response from these two functions/tools, the agent synthesizes the answer and provides the answer below. Here, the agent autonomously understood the functions to be executed to answer the user's question and called them on your behalf. Pretty neat!!
You can also see the end-to-end trace of the agent execution via MLFlow Trace. This helps your debugging process immensely and provides you with clarity on how each step executes.
Memory:
One of the key factors for building an agent is its state and memory. As mentioned above, each function returns an output, and ideally, you need to remember the previous conversation to have a multi-turn conversation. This can be achieved in multiple ways through any orchestrator framework. For this case, we would use Agent LangChain Memory to build a multi-turn conversational bot.
Let’s see how we can achieve this through LangChain and Databricks FM API. We would utilize the previous Agent executor and add an additional memory with LangChain ChatMessageHistory and RunnableWithMessageHistory.
We are using an in-memory chat for demonstration purposes. Once the memory is instantiated, we add it to our agent executor and create an agent with the chat history below. Let’s see what the responses look like with the new agent.
|
Now that we have defined the agent executor , let’s try with some follow up questions on the agent and see if it remembers the conversation. Pay close attention to session_id , this is the memory thread which holds the ongoing conversation.
Nice! It remembers all the user's previous conversations and can execute follow-up questions pretty nicely! Now that you have understood how to create an agent and maintain its history, you can test out how an end-to-end conversation with a chat agent would look like in action.
One way to do this is to utilize the Databricks Playground. Remember, you can serve the agent you just built as a serving endpoint and use it in the Playground to test your agent's performance.
And that's a wrap! With just a few lines of code, we've unlocked the power of autonomous multi-turn agents that can converse, reason, and take action on behalf of your customers. The result? A significant reduction in manual tasks and a major boost in automation. But we're just getting started! The Mosaic AI Agent Framework has opened the doors to a world of possibilities in Databricks. Stay tuned for the next instalment, where we'll take it to the next level with multi-agent AI - think multiple agents working in harmony to tackle even the most complex tasks. And to top it off, we'll show you how to deploy it all via MLflow and model serving endpoints, making it easy to build production-scale Agentic applications without compromising on data governance. The future of AI is here, and it's just a click away...
Reference Papers & Materials :
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.