Databricks Community

kpgireesh · ‎04-21-2025

Introduction

The world of Artificial Intelligence (AI) is evolving rapidly, and AI Agents are at the forefront of this transformation. These intelligent systems, powered by Large Language Models (LLMs), are redefining how businesses solve complex problems by combining reasoning, planning, memory, and tool integration to achieve goals autonomously. From customer service to research assistance, AI Agents have the potential to revolutionize industries by delivering tailored solutions that require subject expertise, strategic thinking, and decision-making.

This guide demystifies the process of building AI Agents from scratch using state-of-the-art frameworks like Databricks Mosaic AI Agent Framework. Whether you're a developer, data scientist, IT executive, or business leader, this guide provides actionable insights into architecture design, model selection, tool integration, evaluation strategies, and enterprise-grade deployment. By the end of this journey, you'll have the knowledge to create scalable and reliable AI Agents that can drive innovation in your organization.

Understanding the AI Agent Architecture

The foundation for selecting an appropriate problem/use-case for building an Agent and its implementation needs a thorough understanding of an Agent’s architectural components. Unlike simple LLM systems (prompt response or chain of thoughts), Agents are made up of multiple interconnected components to provide reasoning, execution of tools and autonomous action capabilities to achieve the goals. The anatomy of an Agent is made up of four major components that define the agent and its capabilities.

Agent/Brain: The LLM acts as the brain of an Agent that processes inputs received, understands its inputs and goals to be achieved. It serves as the central decision-making component, instructs tool calls and acts as the command center orchestrating all other components to work towards achieving the goal.

Planning: Planning is not a component but a key stage in the process where the LLM breaks down larger tasks into smaller more manageable tasks and strategises on next actions. The Planning step involves task decomposition through chain-of-thought reasoning, self-reflection of past actions and evaluation of the effectiveness of ‘strategies’ through internal feedback and before optimising the next best action. Planning is where LLM’s capabilities will be really tested and without robust planning capabilities, an Agent will not be able to achieve its goals.

Memory: Memory allows Agents to store and retrieve information gathered to review past events and maintain context. Memory consists of two types:

Short-term memory: Functions as storage for immediate context to maintain continuity during iterations.
Long-term memory: Stores historical information over longer periods of time which can be valuable for future task completion and learnings.
Agent memory can either be a short term memory or a combination of long term and short term memory depending on decision making inputs needed.

Tools: This component integrates external tools and APIs to provide access to information to extend Agent’s capabilities beyond the internal knowledge (LLM knowledge). The tool set should be well designed providing enough tools with thorough description of tools for Agent to understand when and how to use tools. Examples of tools include websearch APIs, API for database access, specialised functions etc..

Define the Agent

While building an Agent, it’s important to establish a use case, purpose and definition of Agent as the first step before diving into the technology aspects. The clarity at this stage is critical for successful outcomes and technology decisions like model selection, tool decisions, and deployment strategies.

Building a case and purpose for Agent

The critical success factor for an Agent is selection of the right use case which needs Agent capabilities instead of fitting in Agent as a solution for the problem at hand. Once a use case is identified, we should be able to clearly define the Agent's purpose and goals.

Define the Agent

Once it's clear what problem Agent will solve, we progress by defining the Agent. The Agent’s definition should include:

Problems Agent will solve
Target audience, expected interactions and response styles
Fallback strategy for Agent (how to minimise disruptions if Agent fails)
Autonomy level of Agent (full, partial, human in loop)
Identification of Tools required by Agent and Tool integration strategy
Non functional requirements like peak load to be handled, response latency, security considerations
Agent Evaluation strategy and Evaluation
Future plans for extending Agent capabilities for new use cases

Building the Agent

Once Agent is defined, we start building the Agent. We will leverage Databricks for building, evaluating and deploying at scale as Databricks Mosaic AI Agent Framework, MLflow and Model Serving together provides tools and infrastructure to create, deploy and monitor enterprise grade AI Agents with ease. Building Agent involves following key steps

Feasibility assessment - prototype before building

Before jumping into building the Agent, the best practice is to do feasibility assessment. This is a fast iterative phase to assess feasibility of Agent as a solution before committing to build, and also identify probable candidate models/tools for building Agent. This is a critical step as in case feasibility is not established, we need to go back to the drawing board to redefine Agent.

Databricks provides out of box Agent prototype capability with AI Playground. Playground provides an UI to select Model, set Prompt and attach Tools to create the Agent prototype with a few clicks and to test combinations of LLM, Tools and Prompts to create variations and assess the feasibility.

Setting the AI Agent framework for Agent development

An AI Agent framework (like LangGraph) serves as the foundation layer and hence the choice should be made after evaluating available frameworks. Select a framework which has most capabilities supporting the Agent definition we charted for the Agent. Look for Framework which gives a higher level of abstraction and more pre-built modules to reduce complexity and time. If a complex use case needs more custom changes, go for a framework which provides low-level APIs to interact with core components for customisations. Besides, the framework should meet security and scaling requirements (supporting optimisation, distributed processing etc) and should be proven at enterprise level.

Selection of LLM

The next step for Agent development is selection of Model/Models, the selection should enable the Agent definition we laid out. In case Agent handles simple to complex tasks, consider multi-model strategy to have basic models to handle simple tasks and superior reasoning capable models to handle complex tasks (use dynamic routing to direct inputs to appropriate models), basically match model capability to match task, performance and deployment requirements. If Agent handles specialised tasks, prefer LLM which score high on those task specific benchmarks (like HumanEval for Python coding if Agent needs to perform Python coding). We need an iterative approach to try and select a set of best suited models and keep options to try new models during development and after deployment. LLM selection should balance meeting optimal performance, cost efficiency and meeting scalability and other operational requirements. Leverage Databricks AI Playground for quick experiments with models for candidate model selection.

Building Tools

This step implements the tools within Agent according to tools and tool integration strategy decided during Agent definition. Here are options we can explore

Databricks Unity Catalog (UC) functions as tools: Databricks Unity Catalog (UC) functions are an effective way to create tools. Databricks provides UCFunctionToolkit which simplifies creating and using functions as tools abstracting complexities and provides a uniform and consistent method for defining and using tools. Further, it provides auto tracing to track tool performance and retrieved information which are valuable for building observability.

Add third party Tools: Third party tools can be easily integrated, these tools can be API based access to tools (eg: to create a new order, access to create order API) , access to documents (eg: to access to refund policy documentation to confirm refund eligibility), third party API access (eg: access to Google Map API for locations or access to Salesforce API to interact with CRM data)

Providing memory

The memory component needs to be designed to meet the demand of information context and persistence level needed for Agent to function (example: track last six interactions to maintain context in short term memory, retain user preference from past one year in long term memory). Store only relevant information and only for required duration to optimize efficiency, reduce latency and cost. Reduce PII and sensitive data dependency and if unavoidable, comply with compliance requirements of storing PII data. Prioritise latency for real-time tasks (use in memory cache like Redis) and for high-volume/Long-term go for scalable multimodal databases like Databricks Brickstore. Databricks Mosaic Agent Framework provides seamless and secure integration for building Agent memory.

Building the prompts

Providing effective prompts is crucial for directing LLM towards Agents goals. Prompt is an area which needs a lot of iterations to get to prompt which can maximise Agents performance. Prompt should include Role and Context, Task Description, Specific and step by step instructions for Agent, mention sources of information, define output format expected (length, format, style etc..)

MLflow Prompt Management is a powerful tool for prompt engineering and management providing features like Version control, Centralised prompt registry for Reusability, Change management and Collaboration, Enabling flexible deployment isolating prompt from application code, Facilitating prompt comparison and tracking prompts used by models and apps. It’s highly recommended to use MLflow for centralised prompt management for building Agents.

Building the Agent - putting it all together

We build the agent integrating all components using the Agent Framework we selected. The Framework takes care of orchestrating, tool integration, memory management and LLM by bringing together all components to function as Agent.

Evaluation of Agent

Once Agent is built, the Agent needs to be evaluated to ensure Agent definition charted for Agents works which include functional, non-functional and behavior aspects. The evaluation is an iterative process and hence the entire solution should be flexible to adapt and change as needed based on evaluation results. The recommended critical areas for evaluation are Accuracy, Stability (consistency and reliability), Cost, Latency, Response Style and Security. It is critical to involve business users, product team, sales team and behaviour experts (validate Agent behaviour and response) during the evaluation process.

The evaluation is made simple and comprehensive by Mosaic AI Agent evaluation providing advanced evaluation techniques using SDK and UI to define evaluation criteria specific to your business needs. Features like Built-in AI judges, Customize AI judges, Custom metrics, review app for human reviews cover all qualitative and quantitative metrics that are used to evaluate quality .

Productionising Agent

Once evaluation is successful, the Agent can be moved to production. This phase aims at building infrastructure to serve Agent, ensure security and provide governance

Model Serving

Productionising Agents require an infrastructure which can scale at optimal cost, easy to maintain/operate and easy to deploy model.

Databricks Model Serving (a fully managed serverless infrastructure) is an ideal choice for running Agents. Databricks model serving has simplified deployment with few lines of code with automatic scaling and optimisation. Besides providing multi-model serving capabilities and deep integration with data platforms

Security and Governance

AI agents comprise several interconnected components that each present distinct security considerations and security challenges unlike traditional software. A security-first architecture establishes safeguards at every layer of the AI agent system by having least privilege for all agent components and interactions (like giving only read access to certain data in a table and not full read/delete/update etc.)

Databricks provides robust capabilities for security and governance through the Mosaic AI Gateway, Unity Catalog and Databricks AI Security Framework (DASF).

Mosaic AI Gateway, provides centralized permissions, guardrails, rate limits for model access, payload logging which ensure compliance, security, auditing and monitoring compliance, besides providing traffic fallback mechanisms to ensure system reliability. Unity Catalog provide Data governance, secure integration with external APIs and enterprise tools and DASF provides comprehensive framework for managing AI risks, aligned with recognized standards such as MITRE ATLAS and NIST

Agent monitoring and observability (operating the Agent)

Agents being autonomous and complex, implementing robust monitoring and observability systems is critical for ensuring reliability and performance. Without proper observability Agents become "black boxes" that are difficult to debug and trust. Further data from monitoring and observability can be used for online evaluation of Agents for identification of issues and for future improvements.This is a rapidly evolving area and standardized protocols like OpenTelemetry (OTel) and OpenLLMetry are trying to unify observability approaches

Agent Monitoring

We should clearly define KPI which needs to be monitored for building the monitoring system (like accuracy > 95%, latency < 500ms) and implement logging systems which can capture data for these KPI’s. The best practice is to integrate monitoring with organisation central monitoring and alert mechanism to have a centralized visibility and response system.

Mosaic AI Agent Monitoring provides operational and quality metrics with flexibility to configure to get KPI’s operations needs.It also provides evaluation judges (built-in and custom) to track advanced quality metrics.

Agent Observability

Observability is to understand what's happening inside Agents by examining external signals like logs, metrics, and traces to provide deeper visibility into internal working of Agent like Actions and tool usage patterns, Model calls and responses etc..

Databricks Agent observability with MLflow tracing enables automatic and manual tracing to create complete tracing of Agent in real-time.

Conclusion

Building AI Agents is not just about leveraging cutting-edge technology—it's about creating intelligent systems that can adapt to dynamic business needs and deliver measurable value. From defining the Agent's purpose to deploying it securely at scale, every step in this journey requires careful planning and execution. With tools like Databricks Mosaic AI Agent Framework and MLflow, the complexities of development and deployment are greatly simplified, enabling organizations to focus on innovation rather than infrastructure.

As businesses continue to embrace AI-driven solutions, AI Agents stand out as transformative tools capable of solving real-world challenges with autonomy and precision. Whether you're looking to streamline operations or enhance customer experiences, this guide equips you with the foundational knowledge to build robust AI Agents tailored to your specific use cases. The future of AI is here—are you ready to harness its full potential with Databricks?

Databricks Community

Building Intelligent AI Agents: The Complete Guide From Blueprint to Enterprise Deployment

Introduction

Understanding the AI Agent Architecture

Define the Agent

Building the Agent

Evaluation of Agent

Productionising Agent

Agent monitoring and observability (operating the Agent)

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks