The rapid advancements in Artificial Intelligence (AI) are reshaping the landscape of automation, and at the heart of this transformation is the rise of Generative AI Agents.
These agents, powered by large-scale models, are not only automating routine tasks but also introducing new ways of reasoning, decision-making, and interaction that were previously unimaginable.
AI agents can plan, memorise, reason, and act autonomously to achieve pre-defined goals over multiple interactions.
In this blog, we will dive into how these cutting-edge systems are reshaping the financial domain and what it takes to build reliable, domain-specific agents that deliver real value in today’s fast-paced world.
Whether you’re a developer, data scientist, or AI architect, this guide will walk you through building production-grade agents using the Databricks Mosaic AI Agent Framework—from foundational concepts to real-world deployment.
AI agents have the power to change how we work, learn, and interact with the world. However, building these agents is not easy, especially when making them reliable and domain specific. Because of this, many companies focus on creating specialised agents designed for specific tasks.
These agents often rely on enterprise business data, external APIs, a mix of custom code, rules, and careful prompt design to work effectively.
In the competitive world of Global Finance, portfolio managers face immense pressure to make data-driven decisions in real time. But traditional methods of stock analysis are slow, manual, and prone to missed opportunities.
Modern financial institutions are turning to AI to revolutionize investment strategies, combining speed, accuracy, and personalization. But old, manual processes and lack of real-time insights slow down decision-making. Portfolio managers need tools that combine speed and intelligence.
AI based “Investment Assistant Agent” tailored for portfolio managers and automates stock analysis, processes real-time market data, extracts insights from historical data and provides intelligent recommendations.
By leveraging AI to streamline the first level of decision-making, portfolio managers can focus on more strategic initiatives, reduce analysis time, and improve investment outcomes, driving both efficiency and profitability.
An AI agent is an autonomous software system designed to interact with its environment, perceive data, and take actions to achieve specific goals.
These agents simulate intelligent behaviour by continuously learning from their experiences and adjusting their actions based on new information.
Agent tool is essentially a function, i.e. “Tool Calling” is the process in which models predict. Tool Calling does not imply execution of the function; the model simply generates parameters that can be used to call the function. The code can then choose how to handle it, likely by calling the indicated function.
Agent frameworks are essential for building, managing, and scaling AI agents in complex applications. They provide enhanced control over workflows, support for multi-agent collaboration, scalability, debugging tools, and seamless integration with external tools.
AI agents are categorised based on their complexity, decision-making processes, and how they interact with their environment. The main types of AI agents are:
Image source: https://langchain-ai.github.io/langgraph/concepts/multi_agent/#multi-agent-architectures
This framework is designed for rapid experimentation and deployment while maintaining control over data sources.
Databricks Mosaic AI introduces Mosaic AI Agent Framework integrated with MLflow for building high-quality Generative AI applications.
It comprises a set of tools on Databricks designed to help developers build, deploy, and evaluate production-quality agents like Retrieval Augmented Generation (RAG) applications, Text-to-SQL agent, data analyst agent, customer support agent, research agent, business operation assistant, advisory agent and many more.
It focuses on a robust evaluating agent performance through human feedback loops, cost/latency trade-offs, and quality metrics.
It is compatible with third-party frameworks like LangChain/Langgraph and LlamaIndex and leverage Databricks’ managed Unity Catalog, Agent Evaluation Framework, MLFLOW, Model Serving, and other platform benefits.
Here we are going to build a Single Agent using Databricks Agent Framework.
Note:
Source code link is provided at the end.
Financial insights and stock quote prices are extracted from Yahoo Finance API .
Get access to : Yahoo Finance API
Create your own API-KEY ( free key will have limitations, please check the dashboard, link below)
Access to Yahoofinace dashboard for API specifications.
I have uploaded the csv files of historical stock prices (source: Kaggle dataset) in Volumes of Databricks Unity Catalog .
Then created a Delta table from these csv files.
Created synthetic data for customer investment preferences using Python Faker library and uploaded them in another Delta table.
Refer to github for the data preparation scripts.
Note : Please create your own data as per your business requirements .
Function Calling: Function calling allows LLMs to generate structured responses more reliably. This capability allows us to use an LLM as an agent that can call functions by outputting JSON objects and mapping arguments. Function calling is explained in my previous blog.
LLMs are not deterministic and trained on general knowledge from the internet. Generic LLMs can not access real time data or organisational data.
For this solution ,we will create four Unity Catalog function that executes SQL code and Python code.
Run the codes in a notebook cell. It uses the %sql notebook magic to create a SQL based Unity Catalog function and use %python for Python function.
Python function :
SQL Function:
After creating the Unity Catalog function, use the AI Playground to give the tool to an LLM and test the agent. The AI Playground provides a sandbox to prototype tool-calling agents.
Once you’re happy with the AI agent, you can export it to develop it further in Python or deploy it as a Model Serving endpoint as is.
The basic agent package is an auto-generated notebook (driver) created by Databricks AI Playground export. Please refer to the driver notebook in the github link for explanation and steps.
Code-based MLflow logging: The chain’s code is captured as a Python file. The Python environment is captured as a list of packages. When the chain is deployed, the Python environment is restored, and the chain’s code is executed to load the chain into memory so it can be invoked when the endpoint is called.
The agent is using Langchain — LangGraph framework. For other Agent authoring patterns please refer to Author AI agents in code
In order to log a model from code, you can leverage the mlflow.models.set_model() API. This API allows us to define a model by specifying an instance of the model class directly within the file where the model is defined.
When logging such a model, a file path is specified (instead of an object) that points to the Python file containing both the model class definition and the usage of the set_model API applied on an instance of the custom model.
We are using ChatAgent to build the agent.
Databricks recommends using MLflow's ChatAgent interface to author production-grade agents. This chat schema specification is designed for agent scenarios and is similar to, but not strictly compatible with, the OpenAI ChatCompletion schema. ChatAgent also adds functionality for multi-turn, tool-calling agents.
Authoring your agent using ChatAgent provides the following benefits:
Create & log agents using any library and MLflow. Parameterise agents to experiment and iterate on agent development quickly.
Configure LLM Endpoint, System Prompt and Define Agent Tools. Calle mlflow.langchain.autolog() to view the trace for each step the agent takes.
The system prompt is all about crafting precise, role-based instructions that enable the agent to autonomously handle complex, multi-step tasks with accuracy, transparency, and safety.
Here the prompt clearly assigns the agent the role of an investment assistant, which ensures all actions and responses are grounded in financial best practices and relevant context.
The prompt lays out a sequential process for the agent to follow-identifying tickers, gathering customer preferences, analyzing stock data, and generating recommendations. This structure reduces ambiguity, minimizes errors, and ensures consistent, high-quality outputs.
By instructing the agent to only use explicitly provided information (e.g., customer IDs) and never to assume or fabricate data, the prompt enforces strict privacy and security standards.
The agent is guided to clearly communicate any missing information and base recommendations only on available data, enhancing user trust.
Lets you log, analyze, and compare traces across your agent code to debug and understand how your agent responds to requests. Since this notebook used mlflow.langchain.autolog() we can view the trace for each step the agent takes.
Test the agent :
Using MLflow Tracing we can log, analyze, and compare traces across different versions of generative AI applications. It allows us to debug the generative AI Python code and keep track of inputs, tool calls and responses.
Evaluate using mlflow.evaluate() API with databricks evaluation agent, registers agent to Unity catalog, and deploys the agent to a Model Serving endpoint. We can edit the requests or expected responses in the evaluation dataset and run evaluation as we iterate the agent, leveraging mlflow to track the computed quality metrics.
Run mlflow.evaluate : Link to “View evaluation results” will be displayed upon successful execution of mlflow.evaluate() function.
Deploy agents to production with native support for token streaming and request/response logging, plus a built-in review app to get user feedback for your agent.
Successful deployment will show the link to the model serving endpoint to view the status ( generally takes 5–10 mins to start) and get the Review App link (as shown below).
Let’s test the review app : should I invest in Tesla stocks for customer id 1540?
Get feedback about the quality of an agentic application using Databricks Review App.
The Databricks Review App offers a robust platform for integrating domain experts and SMEs into the agent development lifecycle through a human-in-the-loop approach. In this controlled environment, stakeholders such as business users and subject matter experts can interact directly with the agent—engaging in conversations, asking questions, and providing critical feedback. Every interaction is recorded in an inference table, enabling detailed performance analysis. This continuous feedback loop helps ensure the quality, safety, and reliability of the agent’s responses while fostering greater trust and adoption.
This agent is providing a decent answer with this set up.
The codes are available in the github link
Folder : 2025-04-investment-assistant-with-databricks-agent-framework
This blog includes a practical approach to build a domain specific AI Agent for investment recommendation .
While the prototype agent showcases promising capabilities, such as personalised stock recommendations and real-time insights, it is essential to emphasise the importance of a “human-in-the-loop” approach. This ensures that financial risks are mitigated and the system evolves based on expert feedback.
Moving forward, incorporating additional data sources, refining evaluation metrics, and aligning with organisational strategies and multi-agent architecture can further enhance the agent’s accuracy and reliability.
The Databricks Mosaic AI Agent Framework demonstrates its potential as a robust platform for building intelligent, autonomous agents capable of handling complex tasks like financial analysis. By integrating tools such as Unity Catalog, Agent tools/Function, MLflow, Model Serving, Review App and the AI Playground, it enables rapid experimentation, seamless deployment, and comprehensive evaluation of AI agents.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.