Are you looking to supercharge your agents on Databricks with Unity Catalog (UC) functions? This guide will show you how to use UC functions with AI agents, making it easier to automate tasks and integrate AI into your workflows. Leveraging UC functions as tools in your agentic system will be game-changing by simplifying the tool calling, reducing the overheads on the function execution context, and bringing all the governance capabilities into place. All these benefits, together with our Mosaic AI platform components, will tremendously impact your users by shortening the development path of your agentic system and simplifying this path also for the non heavily code users, as explained here. Additionally, we'll take a hands-on approach, walking through a practical example that demonstrates how to create and use these functions on Databricks. Whether you're just starting out or looking to enhance your existing setup, this tutorial will help you unlock the full potential of Unity Catalog in your AI projects.
Before diving into the hands-on example, let’s briefly review some of the basics.
Agents in AI are autonomous systems that perform tasks independently by interacting with their environment, using tools, and making decisions based on gathered information. Unlike Retrieval-Augmented Generation (RAG) systems, which focus on augmenting language models with external knowledge, AI agents empower models to execute actions and interact with the world through tools and decisions. They achieve this by using sophisticated reasoning techniques to achieve their goals.
LangChain is one framework that facilitates the creation and management of AI agents, but agents can also be implemented using various other frameworks. In this article, we will focus on the LangChain framework but also point you to other possibilities of integrating UC functions with agent frameworks. For instance, for multi-agent coordination, you may want to check LangGraph. In the tips section, we provide an overview of currently supported integrations on Databricks.
With Unity Catalog you can store, access, and govern functions. Functions are units of saved logic that return a scalar value or a set of rows. This allows users to define and register executable code as objects and make them accessible to the workspace. Using UC functions, you’d be able to include your custom code to perform different tasks. These functions are stored and treated as any other object in UC, as illustrated below.
Since UC functions inherit all the UC capabilities, they can easily be discovered, managed via the Catalog Explorer UI, as shown below:
To create a UC function, you need the necessary permissions, including USAGE and CREATE permissions on the schema and USAGE permission on the catalog, as illustrated in the figure above. Next, the syntax in SQL follows the one of SQL functions:
CREATE OR REPLACE FUNCTION my_catalog.my_schema.add_numbers(a DOUBLE, b DOUBLE)
RETURNS DOUBLE
LANGUAGE SQL
AS
SELECT a + b;
Similarly, UC functions can be defined in Python by specifying the language as shown below:
CREATE FUNCTION my_catalog.my_schema.add_numbers(a DOUBLE, b DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
return a + b
$$;
In addition to storing your functions directly, you can leverage the Unity Catalog AI Integrations Function Client. This creates a unified interface via which you can store and call functions for common agent frameworks.
from unitycatalog.ai.core.databricks import DatabricksFunctionClient
client = DatabricksFunctionClient()
CATALOG = "my_catalog"
SCHEMA = "my_schema"
def add_numbers(a: float, b: float) -> float:
return a + b
function_info = client.create_python_function(
func=add_numbers,
catalog=CATALOG,
schema=SCHEMA,
)
Please note, UC functions in Python have some limitations, such as you could define any number of Python functions within a Python UDF, but all must return a scalar value. Also, Python functions must handle NULL values independently, and all type mappings must follow the Databricks SQL language. For more, please have a look here.
Having defined UC functions in the language of your choice, you’ll be able to add numerous capabilities to your agents as tools to complete various tasks, which we will discuss more in the next sections.
In an agentic system, agents are provided with different capabilities in order to autonomously complete different tasks. These could be provided either as agent code tools or as UC functions. Below, we’re outlining the main differences between these two:
Agents Code Tools |
UC Functions |
Defined within the agent’s code |
Must be explicitly stated to agents to make them available for use |
Preferred in situations like developing more arbitrary code, calling REST APIs, or executing low-latency tools |
Preferred in situations when applying transformations and aggregations on large datasets. |
No built-in governance |
All the UC governance capabilities |
Most importantly, depending on the above-mentioned benefits, you could use either Agent Code Tools or UC Functions or fill out gaps by combining both of them, since both methods are compatible with custom Python/SQL or common agent frameworks like LangGraph.
UC Functions enable organizations to expose common transformations and operations, making them accessible for use in agentic applications. This integration allows agents to execute tasks efficiently and safely, ensuring that data operations are consistent and governed. This way, UC functions with our Mosaic AI capabilities simplifies and contributes to a faster developer loop.
Following the image above, we’d like to explain the power of UC functions on a developer's loop. Adding capabilities to agents via UC functions enables the developers to focus on the logic of their agentic applications by relieving them from the extra work of maintaining the governance on top. When the tools are registered UC, they are treated as any other UC object and thus inherit all the UC capabilities like the UC permission model, easier discoverability in the catalog explorer, and lineage. Furthermore, if your agents need to connect to external applications like Slack, Google Calendar, or any service with API using HTTP requests, the whole authentication step is simplified via the UC security model as described here. In addition, the developer can immediately test their agents with their UC functions in the AI Playground, where they get a feeling of their application as end users would. Finally, they can export their interaction as code from the playground to a Databricks Notebook, and further build and iterate upon this. This way, developers with different skill levels can get started and improve the quality of the agents in the Databricks Mosaic AI platform. The same approach we’ll follow for our real-world example, as explained here.
Having understood the concept of UC functions as well as the benefits they bring to the agentic systems, we will now demonstrate how to build an AI agent for fraud detection in insurance claims. The agent uses specialized tools to analyze claim data, detect anomalies, check customer history, and flag potentially fraudulent cases for further investigation, thus supporting claims investigators in identifying suspicious activities efficiently. While this example illustrates just one potential use case, other applications in insurance and beyond include: ticketing support systems, financial risk assessment, healthcare patient data management, supply chain equipment maintenance, and many more.
Insurance companies face increasing pressure to streamline claims processing while maintaining accuracy and minimizing fraud. Detecting fraudulent claims, especially in high-volume segments like car insurance, presents an ideal opportunity to employ AI-driven solutions. Our agent can use machine learning algorithms to scrutinize claims data, compare it with historical patterns, and identify inconsistencies or red flags indicative of fraud. This can automate routine verification tasks and prioritize claims that warrant deeper review, all while reducing the manual workload on human claim investigators.
Before delving into the technical implementation in the next section, we will ensure that we have the requirements and the environment ready in order to follow up with our example.
It is quite straightforward to get started working on your agentic systems with UC functions in Databricks. To simplify experimentation and scaling, you may run UC functions on Serverless—always available compute without manual configuration. For more, please refer here.
Next, you need to ensure you have access to the LLM with the tools you want to use in Model Serving, which deploys and governs any type of AI model in Databricks as a REST API endpoint. For more, have a look here. In our example, we will start with the Llama 3.1 405B Instruct Model. Please, also make sure you have met the following requirements to use the Databricks AI Playground tool.
Not all LLMs support tools, but many are developed or fine-tuned to integrate with external tools. Some notable models that support tool integration include:
Why don’t all LLMs support tools?
Not all agents support tools due to several reasons:
Lastly, while this article focuses on an example where using tools makes sense, you do not need agents or tools for every LLM use case. Therefore, depending on what you are trying to achieve, it might make sense to choose a different (smaller) model in some cases. Within Databricks Model Serving, you can choose from the following available models, integrate with external model providers, or use your own custom model.
As mentioned above, the dataset we are using is the Vehicle Insurance Claim Fraud Detection, which is publicly available on Kaggle, and it has been plainly loaded into a Delta table using Spark in a Databricks Notebook. Next, we’ll follow the three main steps below:
Once the requirements are met as explained above and the necessary UC permissions are in place as described here, we are ready to create the functions in UC in either Python or SQL. In our example, we want our agent to be equipped with specialized tools that can flag potential risk on the claims, assess which claims need manual review, analyzing the claims based on their properties; and therefore, we created all these capabilities as UC functions using SQL in our own catalog and schema blog_demo.fraud_claims as shown below:
UC function flagging potential risk based on severity and claim amounts
UC function computing the ratio of individual claim parts to the total.
UC function suggesting why an agent might flag this claim for manual review
UC function analyzing an insurance claim using the previous functions
Please note: The above functions are just examples. You could definitely load the dataset of your choice, add your custom logic by creating your UC functions, as illustrated above, and follow the next steps.
Once we have defined the functions as in the previous step, we would like to test them immediately in the AI playground. This helps us get a feeling of how our agent works and refine it if needed. As mentioned above, in our example, we will use the Llama 3.1 405B Instruct Model. In order to get started and to see which endpoints you have available, please navigate to the AI Playground in the Machine Learning tile in your Databricks workspace as shown below:
Once you have chosen your model endpoint, you will be pointed to the window, where we will integrate the functions we have created in our blog_demo.fraud_claims from the previous step:
Having the UC functions in place in the playground, now it is time to test them with some questions, as shown below:
As seen above, our agent is working well by calling the right function and flagging the fraud as not high risk based on how we have defined it in the previous step. Further, you’re provided with our built-in judges, which assess the model’s response in this case as Safe and Relevant. You could also view the trace to inspect what is happening behind the scenes. Finally, you’re also provided with some suggestions of questions to ask if you want to test your tools further, or you could simply make your own questions.
After getting a feeling for our agentic system and confirming that our tool calling is working fine, we can export the whole interaction into the Databricks notebook code by selecting the Export option in the Playground as shown below:
This generates the notebook, where you’ll be able to run step-by-step all the code from how the agent is defined, logged using MLflow, evaluated using the Agent Evaluation, deployed and assessed with the Databricks Review App, and finally monitored with Lakehouse Monitoring. This way, we showed how you can quickly get started developing your agentic systems with the benefits of UC functions on top, as well as with the power of AI Playground by automatically generating the code for you, and then you could additionally refine it and adjust it to your needs.
Many of our customers ask themselves whether the use of agents over hardcoded workflows makes sense. Although agents have made a huge turn in GenAI applications by bringing autonomy to decision-making, adaptability, and control flow within these applications, workflows are still very useful in certain situations. Since workflows involve a specific order of actions, they are efficient for routines with a fixed set of steps. They bring the main benefits of lower latency, easier debugging, and lower costs. However, in situations when you need the system to decide autonomously which actions to take, especially in cases in which the user might ask for things that you cannot predict, then agents are the way to go. They represent a more complex architecture, which might bring more costs compared to the former. Depending on such situations, you should consider choosing one over the other.
Additionally, agents with UC functions as tools bring the whole governance capabilities on top and complete integration with the Mosaic AI ecosystem. Current supported integrations can be found here, and for any limitations or upcoming features, keep an eye on our public site.
UC functions bring all the UC capabilities to your agentic systems. By using UC functions as tools, you can leverage the advantages such as lineage, access management, and auditing like you would with any other UC object. Furthermore, UC functions are very useful when you need to apply transformations and aggregations over large datasets, discarding the need for you to worry about any function execution context.
In this blog, we illustrated the use of UC functions in agents as tools by demonstrating the topic of insurance claims as a real-world example. In this agentic system, we showed how to create UC functions as tools for the agents to complete tasks such as assessing claim severity, verifying customer history, and prioritizing requests based on urgency to support claims investigators. In addition, we demonstrated how our Mosaic AI platform, with the governance layer of UC on top, simplifies the process of building AI agents in Databricks.
The example we provided is one of many possible agent applications, and there are definitely hundreds of other scenarios you could think of in various industries, such as creating a ticketing support system, assessing credit risk and compliance for financial risk, managing patient data in healthcare, analyzing equipment data and scheduling maintenance in the supply chain, and much more. If, after reading this article, you are curious to get started, let us know your thoughts or approach your Databricks representative for your ideas to build agentic systems using UC functions on Databricks!
Additionally, to get started right away with more hands-on examples, you could check out our demo center, which has the latest updated Databricks demos and tutorials, including agent capabilities like Smart Claims, Lakehouse IoT platform, Lakehouse Retail C360, and more!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.