Databricks Community

alexandergenser · ‎06-02-2025

What is an AI agent and why build an agent?

Advanced AI systems go beyond simple question-answering—they can interact with live data, call tools dynamically, and adapt to real-world complexities. In transportation, real-time scheduling is a perfect challenge: users need accurate, up-to-the-minute information on routes, delays, and connections.

This blog demonstrates how to build an AI-powered travel agent using Databricks to integrate real-time train data through external APIs. Rather than relying solely on a large language model (LLM), we design a system with Databricks, where the agent can retrieve live train schedules, process user requests, and return actionable insights. For that, we use the Transport API that provides train schedule data, and build an agent that can leverage two tools implemented with Unity catalog functions. Also, we show how an agent can be deployed and evaluated with the Mosaic AI Agent & Evaluation Framework.

Figure 1: High-level architecture of the used Mosaic AI Agent Framework components.

Key steps covered in this blog:

Building tools with Unity Catalog functions to perform tasks and interact with the main LLM (using SQL or Python).
Testing tool performance in the Databricks Playground.
Exporting the AI agent solution to a Databricks Notebook, which integrates the Mosaic AI Agent Framework for building, deploying, and
Leveraging the Mosaic AI Agent Evaluation evaluating production-quality AI agents.

By following this approach, you’ll develop a fully functional AI assistant capable of handling real-time train queries - demonstrating how AI can bridge the gap between raw data and intelligent decision-making.

Train connections from Transport API

The Transport API, provided by Opendata.ch, is leveraged to retrieve public timetable data used to build the travel agent that allows the ask for train connections in Switzerland. To build a travel agent that lets users ask for train connections in Switzerland, we use public timetable data through theprovided by Opendata.ch. Note that this approach is not limited to the application of this particular API. You can build an agent in a similar fashion for other railway organizations, point-to-point traffic duration and distance with the Google Distance Matrix API overview, or for flying with the Skyscanner Developer APIs.

The Transport API builds on REST and provides three endpoints to gather location, connection, and station board data:

Locations: Determine the location of a train station by either providing a query (a string representing a location) or locations x and y (representing latitude and longitude). By default, the API call returns all location types (stations, points of interest, etc.). Hence, for this work, the type should be set to ‘stations’.
Connection: Retrieval of a list of connections between two specified locations to determine the connections between them. As a required input, two strings for departure and arrival location have to be provided. Also, it includes optional detail parameters such, e.g., as departure and arrival times and transport modes.
Stationboard: Return the next connections departing from a specific train station. A station string is required for retrieval. Additionally, the station ID, the limit of retrieved connections, transportation mode, and whether the connections should represent arrival or departure at a specific time can be specified.

Detailed documentation about the endpoints can be found here. Note that the API allows for an extension to include other transport modes. Building this AI agent, we focus on train connections and station board data, i.e., we leverage the API endpoints /connections and /stationboard.

Creating agent tools with Unity Catalog functions

An agent leverages an LLM as an engine to reason about actions to take. An action involves utilizing tools that allow the agent to, e.g., search the web, retrieve certain internal or external data (e.g., product documentation), or call APIs. Different tools can be created as interfaces for an agent to use when useful for the task at hand.

In this blog, we will build our tools with Unity Catalog functions. Unity Catalog is a core part of the Databricks Data Intelligence Platform and provides a centralized governance and management system across multiple workspaces, ensuring data consistency and security. Hence, we can benefit from these core advantages using Unity Catalog functions:

Defined and managed in Unity Catalog with built-in security and compliance features
Creates a central registry for tools that can be governed like other Unity Catalog objects
Grants easier discoverability and reuse
Useful for calling REST APIs, using arbitrary code, or executing low-latency tools

This way, we organize, manage, and govern our Unity Catalog functions in the same fashion as we already do with our data, models, and other assets made available through the Unity Catalog. For functions in particular, the owner can grant EXECUTE permissions to users or service principals, allowing the function to be used. In our case, we utilize Unity Catalog functions to create AI agent tools that execute custom logic and perform specific tasks that extend the capabilities of LLMs beyond language generation - in our case, train connection retrieval.

In general, there are four types of agent tools that we are going to distinguish here:

Code interpreter tools: You can use this type of tool to execute a custom Python implementation with an agent.

Structured data retrieval tools: A tool that executes a SQL query enables your agent to query structured data already present in a catalog and schema.

Unstructured data retrieval tools: When your Vector Search Index (VSI) contains unstructured data, you can easily query that data using SQL and the Databricks SQL function vector_search(). You simply direct the function to your VSI and pipe in your query.

External connection tools: Connecting to APIs or other external applications and handling the return value with a custom code implementation.

In the following example, we will focus on external connection tools as we need to connect to the Transport API and implement custom Python code that handles the request and processes the response before returning information to the agent. Nevertheless, the flexibility outlined above highlights that there are no limits when it comes to defining, managing, and especially governing your custom agent tools.

Setting up a catalog and schema in Unity Catalog

To integrate your train connection agent with Unity Catalog, you'll need to first create a catalog and a schema for your project. This allows you to segregate your projects and also to perform fine-grained governance. We will do this here with two simple SQL statements in a notebook:

%sql
CREATE CATALOG IF NOT EXISTS travel_agents;
CREATE SCHEMA IF NOT EXISTS train_agent;

Creating agent tools with Unity catalog functions

In this small project, we will create two functions that the agent can use as tools. First, we want to retrieve train connections from the Transport API. Given the documentation of the connections endpoint, a destination and arrival location are required for the API request. Nevertheless, we will also implement the via option in case a user prompts the agent with, e.g., ‘I want to go from Zurich to Geneva but I need to make a stop in Luzern’.

The Transport API does not require any authentication. Therefore, we don’t need to create a connection here. Nevertheless, if you want to use an external service where authentication is required, you can leverage Unity Catalog connections to do so.

We start off by utilizing CREATE FUNCTION and implementing the prototype of the function, i.e., we specify the three parameters and also the return object as a string. Important to note here is that we leverage the comment keyword to document a) what the input parameter represents and b) what functionality is provided by this function. This is particularly important as the agent uses this information to reason about the decision if a tool is the right choice for a given task.

The core functionality of the defined functions is Python. What follows is a typical Python implementation of a REST API call with the requests package. We declare a dictionary for the parameters, where the keys represent the parameters specified in the API documentation and the values correspond to the Unity catalog function input parameters. Finally, we process the payload by returning the content of the connection element. If the request fails, an error message is returned to the agent.

%sql
CREATE OR REPLACE FUNCTION travel_agents.train_agent.get_connections(
   from_station STRING COMMENT 'The train station of departure',
   to_station STRING COMMENT 'The train station of arrival',
   via_station STRING COMMENT 'The desired stops in between departure and arrival'
)
RETURNS STRING
COMMENT 'Executes a call to the transport API and connections endpoint to retrieve relevant train connections given the input parameters from (departure), to (arrival), via (stops in between, if specified).'
LANGUAGE PYTHON
AS $$
import requests

url = "http://transport.opendata.ch/v1/connections"
params = {
   "from": from_station,
   "to": to_station,
   "via": via_station,
   "transportations": "train"
}

response = requests.get(url, params=params)

if response.status_code == 200:
   next_connection = response.json()
   return next_connection['connections']
else:
   return f"Failed to retrieve connection. Status code: {response.status_code}"
$$;

In a very similar fashion, we create another function that retrieves the station board data of a given train station. For that, we leverage the stationboard endpoint of the Transport API. The station parameter as a string is required and we utilize the optional parameter describing if the user wants to get arrival or departure data:

%sql
CREATE OR REPLACE FUNCTION travel_agents.train_agent.get_station_board(
   station STRING,
   arrival_or_departure STRING
)
RETURNS STRING
COMMENT "Returns station board"
LANGUAGE PYTHON
AS $$
import requests

url = "http://transport.opendata.ch/v1/stationboard"
params = {
   "station": station,
   "type": arrival_or_departure,
   "limit": 15,
   "transportions": "train"
}

response = requests.get(url, params=params)

if response.status_code == 200:
   station_board = response.json()
   return station_board
else:
   return f"Failed to retrieve connection. Status code: {response.status_code}"
$$;

Note that the hard-coded values for the limit of returned records and the transportation mode can be changed according to the documentation or introduced as function parameters. The code for the UC functions can also be found in the GitHub repository here.

Finally we check in Unity Catalog if our functions are created as expected. Given the three-level namespace of Unity Catalog, we expect a catalog travel_agents, a schema train_agent and then the two functions get_connections and get_station_board.

Screenshot 2025-05-20 at 16.09.10.png

Figure 2: Overview of created functions in the Unity Catalog explorer.

Building your small travel AI agent in Databricks Playground

In this section, we will walk through the step-by-step process of creating an AI agent in Databricks Playground, integrating it with real-time train schedule data from the Swiss rail network. By the end of this guide, you will have a working AI agent capable of answering queries like:

"I want to go from Zurich to Geneva now. Is there a train leaving in the next hour?"

Before integrating the tools into an AI agent, we test them in the Databricks Playground to ensure they work as expected. Testing in a controlled environment ensures the agent has reliable access to real-time data before deployment. More details can be found here.

Step1: Access Databricks Playground

Before adding tools to the AI agent, we must set up the environment and verify tool functionality. The Databricks Playground provides a controlled space to test and validate tools before full deployment.

Open the Playground:
1. In your Databricks Workspace, navigate to Machine Learning > Playground from the left-hand menu
Select an LLM (Large Language Model) that supports tool calling (indicated by the Tools Enabled icon).
1. This LLM will act as the reasoning engine of our AI system.
2. In our case, we are going to select the foundation model Meta Llama 3.3 70B Instruct.

Figure 3: Navigating to the Databricks Playground

Step 2: Add Tools to the AI Agent

Now, we integrate the travel planning tools created above into the AI agent. These tools allow the agent to find train connections and retrieve station board data from any train station in Switzerland in real-time.

Click the Tools dropdown arrow.
Select "Add Tools" > "Add Hosted Function".
Choose the Unity Catalog functions you created earlier (get_connections, get_station_board).

Figure 4: Selection of tools as Unity Catalog functions in the Playground.

Step 3: Adding a system prompt

A system prompt needs to be specified to define the agent's behavior, set constraints, and provide context. For this travel AI agent, the system prompt takes care of the following:

The focus of the agent: Providing travel information for transportation mode ‘trains’ but no other actions like e.g., booking tickets.
Specification of data sources and actions: Provide a description of the tools that the agent can leverage to focus on the action it should take.

The complete prompt template can be found in the GitHub repository here. To add the system prompt to our test environment:

In the Databricks playground, click ‘Add system prompt’ and copy past the above prompt into the input field.

Step 4: Test and Validate the Agent

Before deploying the AI agent, we must ensure it correctly interprets queries and calls the right tools.

Enter Sample Queries

Example:

Prompt: "Find the next train from Zurich to Bern."

The agent should utilize the tool get_connections and return relevant train connections.

Example:

Prompt: "What are the next three trains leaving from Geneva?"

The agent should utilize the tool get_station_board with the departure parameter and extract the next three connections from the retrieved payload.

If the response is incorrect, verify:

The Unity Catalog functions return expected results.

The agent is correctly calling the tools based on input queries.

Step 5: Export and Deploy the AI Agent

Once the AI agent functions as expected, we export the setup into a Databricks Notebook for deployment.

Export the Agent

Click "Export" in the Playground.
This generates a Databricks Notebook with the full agent configuration. Navigate to the Exported Notebook

Locate the exported driver notebook in your workspace.

Configure the catalog, schema, and model names (look for "TODO" comments).

Figure 5: Exporting of the automatically generated agent implementation.

By following these steps, you now have a fully functional AI-powered travel assistant, capable of answering real-time train schedule queries using Databricks Playground and the Swiss Transport API. The driver notebook allows to define, test, evaluate and deploy the agent. Be aware that the notebook contains a few sections where information needs to be filled in (e.g., the catalog, schema and model name where the agent should be deployed to). The code with these steps incorporated is available in the GitHub repository here.

Agent evaluation with Mosaic Agent framework

When you export your AI agent using the driver notebook in Databricks, both Agent Evaluation and the Review App are automatically integrated. This enables you to assess and refine the agent before deployment, ensuring that it is fully optimized and ready for real-world queries.

Agent evaluation is a crucial process for ensuring that the AI agent performs as expected, delivering accurate, efficient, and reliable responses. The Mosaic AI Agent Evaluation framework helps automate the process of testing and assessing agent performance, optimizing the system before going live. It evaluates the agent’s effectiveness based on key criteria like accuracy, consistency, and efficiency.

Here, we specified three global guidelines according to the documentation's example: a rejection guideline to avoid rejecting unrelated queries, a conciseness guideline meaning that the response must include a train line and details about the journey, and one guideline to ensure that the response is professional. Per default, the evaluation framework checks also for relevance and safety. The corresponding code is in the GitHub repository here.

Screenshot 2025-05-20 at 16.12.19.png

Figure 6: Evaluation framework example with custom-defined global guidelines.

Along with the framework, the Agent Review App allows users to simulate real-world queries and assess the agent's performance. It provides an interactive space to evaluate responses, identify potential issues, and fine-tune the agent. If you execute all cells of the driver notebook that we automatically exported above from the playground, you get an automatically generated link to the review after deploying your agent. Alternatively, you can also invoke the Agent Review app with the code snippet here:

import mlflow
from databricks.agents import review_app

# The review app is tied to the current MLFlow experiment.
mlflow.set_experiment("same_exp_used_to_deploy_the_agent")
my_app = review_app.get_review_app()
print(my_app.url)
print(my_app.url + "/chat") # For "Chat with the bot".

By leveraging these tools, you can continuously monitor and improve the AI agent, ensuring it meets the desired performance standards. This makes it easier to deploy reliable and high-performing AI agents that are equipped to handle real-world queries effectively.

Best Practices for Building AI Agents

Ground AI Agents in Reliable Data Sources

To ensure accurate and trustworthy responses, AI agents could leverage retrieval-augmented generation (RAG) and integrate with structured data sources. By using external databases, APIs, or enterprise knowledge bases, AI agents can dynamically fetch real-time information instead of relying solely on pre-trained knowledge, reducing hallucinations and improving decision-making. Here is an example of calling RAG with tools in Databricks.

Continuously Evaluate and Optimize Performance

AI agents should be assessed regularly to measure accuracy, response quality, and efficiency. Databricks Mosaic AI Agent Evaluation Framework provides structured evaluation methods, allowing developers to refine agent behavior, debug issues, and improve tool usage. Iterative testing in Databricks Playground ensures robust performance before deployment.

Ensure Transparency, Governance, and Responsible AI

AI agents must be transparent, secure, and aligned with ethical AI principles. Databricks Unity Catalog enforces governance with role-based access control, while the Databricks Review App helps users analyze and refine AI agent outputs. For organizations prioritizing fairness, accountability, and bias mitigation, Databricks provides comprehensive resources on Responsible AI to guide best practices.

By following these principles, businesses can deploy AI agents that are reliable, scalable, and aligned with enterprise goals.

Conclusion

In this blog, we demonstrate the ease of building AI-powered travel agents using Databricks. By using the Data Intelligence Platform and in particular the Mosaic AI Agent Framework, we've created a sophisticated AI assistant capable of handling real-time train queries for the Swiss rail network.

Key takeaways from this guide include:

The seamless creation of reusable, secure, and governable agent tools with Unity catalog functions and the integration of external APIs (Transport API).
The simplicity of testing and refining AI agents using the Databricks Playground.
Automatic generation of a Databricks Notebook with the full agent configuration for a quick time to production.
The importance of continuous evaluation and optimization using the Mosaic AI Agent Evaluation Framework.

This project serves as a foundation that can be easily extended and adapted for various transportation applications or other domains with AI agent applications. Moreover, the work presented in this blog can be further enhanced using Databricks Apps, allowing for the creation of interactive, user-friendly interfaces for your AI agents. With Databricks Apps, you can deploy your travel assistant as a full-fledged application, complete with custom UI elements, making it even more accessible and valuable to end-users.

Databricks Community

Building an AI Agent for transportation applications with Databricks

What is an AI agent and why build an agent?

Train connections from Transport API

Creating agent tools with Unity Catalog functions

Setting up a catalog and schema in Unity Catalog

Creating agent tools with Unity catalog functions

Building your small travel AI agent in Databricks Playground

Agent evaluation with Mosaic Agent framework

Best Practices for Building AI Agents

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks