Databricks Community

caiofarias · ‎05-13-2025

AI-driven Supply Chains

All supply chain activities rely on Enterprise Resource Planning (ERP) systems. Integrating these data-rich ERP systems with modern AI-driven analytics can significantly enhance supply chain operations by unlocking deeper insights, accelerating decision-making, and improving efficiency. Two prominent approaches to achieving AI-powered insights using Large Language Models (LLMs) are Retrieval-Augmented Generation (RAG) and Graph Retrieval-Augmented Generation (GraphRAG). Understanding these methodologies' differences, strengths, and limitations is critical when integrating structured ERP data into AI workflows. Both techniques offer substantial benefits, yet neither represents a universal solution; each excels in specific scenarios and has unique strengths and limitations. Throughout this blog, we will explore these methods in depth, demonstrating how and when to use each effectively and illustrating how to expand these solutions into dynamic, adaptable "agentic applications" - offering potential expandability through additional tools or integration into broader multi-agent systems - suitable for complex ERP-driven supply chain workflows.

Note: You can view the source code for this blog on Github.

RAG versus GraphRAG

Retrieval-Augmented Generation (RAG) leverages the retrieval of relevant documents or textual data to inform AI-generated responses. It excels at capturing semantic similarity and can rapidly adapt to changing textual knowledge sources. However, its reliance on unstructured or semi-structured textual data limits its ability to navigate complex, explicit relationships inherent in structured business processes, such as those in supply chains.

In contrast, GraphRAG enhances AI's reasoning capabilities by leveraging structured data represented explicitly as knowledge graphs. GraphRAG uses graph-based retrieval methods, enabling more precise and contextually rich insights derived from interconnected data. While GraphRAG demands higher setup complexity and maintenance efforts due to structured graph management, it provides superior accuracy and transparency for domains with intricate relational data.

Criteria	RAG	GraphRAG
Knowledge Representation	Unstructured/Semi-structured documents	Explicitly structured graph knowledge
Retrieval Method	Embedding similarity search	Graph traversal & embedding search
Complexity	Lower, easier to implement	Higher complexity, setup, and maintenance
Accuracy/Reasoning	Good for broad questions or straightforward retrieval	Superior for complex reasoning, explicit relations
Explainability	Moderate	Higher explainability (explicit relationships)
Scalability	High scalability for documents (requires embeddings)	Moderate scalability due to computational overhead
Tools	Databricks Vector Search	Specialized graph databases

Table summarizing the differences between RAG and GraphRAG.

Supply Chain Knowledge Graphs

6degrees - SAP.png

Supply chain data naturally fits into a graph structure due to its inherent relationships and dependencies among various components, including suppliers, products, inventory, locations, transportation routes, and transactions. Representing supply chain data in a knowledge graph enables businesses to visualize and understand these complex relationships, promoting improved decision-making and strategic planning.

Companies can utilize Databricks to couple the enhanced visibility provided by graph databases with LLM-powered “agents” using the Mosaic AI Agent Framework. This enables organizations to leverage AI to make proactive, data-driven decisions, enhance operational agility, and optimize supply chain performance.

Why Use Databricks to Build Agents

Databricks provides a framework for building and governing AI agents within the Data Intelligence Platform. This allows secure integration and customization of enterprise-specific data, ensuring accurate, relevant AI outputs. Databricks also provides automated tools for rapidly evaluating and improving AI agents, facilitating agile development and robust end-to-end governance. This integrated approach ensures high-quality performance, comprehensive oversight, and adaptability, enabling businesses to confidently deploy AI agents that effectively automate tasks, enhance decision-making, and optimize operations across diverse business function

Unity Catalog Integration

Unity Catalog is the most important Databricks offering that truly unlocks enterprise Data Intelligence. It provides a unified data governance layer for managing, discovering, and securely sharing data across organizational teams. It enhances compliance with regulatory standards by enforcing robust data access controls and detailed audit logging. Additionally, Unity Catalog simplifies collaboration through centralized metadata, credential management, and data lineage, enabling traceability and accountability across analytics workflows and enterprise operations.

Major enterprises have relied on Databricks to unlock Data Intelligence and apply AI advancements to their proprietary data to solve domain-specific problems. Databricks is uniquely positioned for Data Intelligence because it allows you to govern all your data in a single estate, providing features designed specifically for creating production-ready agentic applications. In the next section, we will uncover how each of these features can be used to create an intelligent chat agent that understands your supply chain.

How to Build a Supply Chain Agent

Note: The code used in this scenario can be found here. Follow the README to set up the prerequisites and execute the notebooks to deploy the GraphRAG application. Here we will walk through the high-level process and cover key application features.

Let’s clarify a few concepts:

An agent is a combination of an LLM and one or more tools. Tools are functions designed to help an LLM perform specialized tasks (such as database lookups).
The purpose of our agent is to interpret user questions using natural language and look up data from the graph to use as context for answering these questions.

The first requirement is to prepare and load the supply chain dataset to a graph database so the LLM can retrieve data. At this point, we have two key considerations to address. First, selecting the right graph database provider is essential. Evaluating cost, scalability, maintenance, and alignment with business requirements is critical, given the many options available. Second, the ERP source system in use must be identified. Each ERP platform has a unique data model and integration needs—some may offer built-in graph representations, while others require preprocessing to shape the data appropriately. Fortunately, Databricks excels at large-scale data transformation and supports seamless integration with virtually any data source, including the most widely used ERP systems.

Databricks recently established a strategic partnership with SAP, a widely used ERP provider. Thus, for this example, we chose to use the SAP Bike Sales dataset, which contains global sales and distribution data for a fictional Bike distributor. SAP makes this data available and can easily transform it into a graphical representation. As for the database, we will be using Neo4j Aura, a popular cloud solution with robust Databricks connectivity.

We won't be connecting directly to an SAP instance for this example. SAP provides the dataset in CSV format, which we can load to Neo4j by executing a command on the database. First, the query loads all of the graph's nodes (such as products, employees, customers, and suppliers) and then links the nodes together using relationships. These relationships represent the paths connecting people and items when executing sales orders.

After loading the data into the database, use the Aura DB console to explore the supply chain network. Then, move to Databricks to build the agent chain.

Agent Framework

In order for the agent to utilize the graph, we must connect to Neo4j and register the Graph Retriever as a tool. The Mosaic AI Agent Framework introduces a specialized framework for creating and improving tool-calling agents. A simplified SDK is provided as part of the framework that allows you to stitch together components of your RAG application using open-source Python libraries and quickly iterate using MLflow.

Our supply chain agent uses the Meta Llama 3.3 70B Instruct as the base model, and Neo4j establishes a connection using the prebuilt GraphCypherQAChain. LangChain connects these building blocks.

from databricks_langchain import ChatDatabricks
from langchain_neo4j import Neo4jGraph, GraphCypherQAChain
from langchain_core.tools import Tool
import os


# Initialize the llm
llm = ChatDatabricks(
   endpoint=config.get("llm_endpoint")
)


graph = Neo4jGraph(
   url=os.getenv("NEO4J_HOST"),
   username="neo4j",
   password=os.getenv("NEO4J_KEY")
)


chain = GraphCypherQAChain.from_llm(
   graph=graph,
   cypher_llm=llm,
   qa_llm=llm,
   validate_cypher=True,
   allow_dangerous_requests=True,
   use_function_response=True, # Add this to use function output as tool
   verbose=True
)


graph_tool = Tool(
   name="graph_qa_tool",
   func=chain.invoke,
   description="Generates a Neo4j Cypher query to answer questions and executes the query on the graph instance."
)


tools.append(graph_tool)

Next, after enabling MLflow Tracing for observability and logging the agent’s model to Unity Catalog, you can deploy it for testing and feedback using the simple agents.deploy() command:

mlflow.set_registry_uri("databricks-uc")


# TODO: define the catalog, schema, and model name for your UC model
model_name = "sapgraph"
UC_MODEL_NAME = f"{catalog}.{schema}.{model_name}"


# register the model to UC
uc_registered_model_info = mlflow.register_model(model_uri=logged_agent_info.model_uri, name=UC_MODEL_NAME)


from databricks import agents


# Deploy the model to the review app and a model serving endpoint
agents.deploy(
 UC_MODEL_NAME,
 uc_registered_model_info.version,
 scale_to_zero_enabled=True,
 environment_vars={
       "NEO4J_HOST": "{{secrets/your-secret-scope/neo4j-host}}",
       "NEO4J_KEY": "{{secrets/your-secret-scope/neo4j-key}}"
   }
)

AI Playground

The Databricks AI Playground simplifies the process of building agent chains by offering a safe, isolated environment for GenAI engineers to experiment with various models, analytical strategies, and AI agents—without impacting the production environment. Within the Playground, teams can rapidly prototype and switch between the latest foundational models – such as Claude and Llama. The tight integration with DBSQL allows you to create tools from SQL functions and add them directly from the UI to your agent. Additionally, source code can be exported and used as a starting point for agent development in notebooks, where you can continue to develop and test more complex logical steps. In fact, the code used in this example began as an export from the Playground.

Model Serving & Databricks Apps

Calling the agents.deploy() command automatically sets up a Mosaic AI Model Serving endpoint, making the agent model available as a REST API endpoint that you can integrate with your applications. In such a scenario, we can deploy a lightweight application on top of the serving endpoint that allows non-technical business users to chat with the agent.

Databricks Apps supports common frameworks like Streamlit, Dash, and Gradio. In our example, we deploy a Streamlit app that exposes a chat interface to users. Both Mosaic AI Model Serving and Databricks Apps are low-latency and serverless, allowing you to easily scale to many end users and providing fast response times suitable for production workloads.

Now for the fun part – chatting with the agent! Because the LLM already contains all the supply chain context, we can ask it business-specific questions like:

What product categories are most frequently associated with top-selling products, and how do these categories impact overall sales?
What are the typical paths through which products move from suppliers to customers, and how can these paths be optimized?
Which employees have the most relationships with business partners? Do any of these relationships overlap?
What is the flow of products between regions? Is this optimized?

The chat agent utilizes our business's specific data to answer questions about the bike supply chain. By analyzing the agent's reasoning, we can observe that it opted to execute multiple graph queries using the tool and subsequently synthesized the outcomes.

The final but potentially most crucial piece of the development lifecycle for agents is ensuring the responses are correct, consistent, and high-quality. This is where Databricks sets itself apart from other solutions. Agent output can be easily improved using feedback from humans and LLMs through Agent Evaluation.

Evaluation

Databricks offers distinct advantages for building AI agents through Mosaic AI offering. A key differentiator is its robust evaluation framework, which is critical for maintaining the quality and reliability of AI agents. Effective evaluation practices provided by Databricks documentation include custom metrics and rules tailored to specific business needs, enabling precise agent performance monitoring. Databricks also facilitates the collection of expert feedback through user-friendly applications, supporting continuous improvement of AI agent systems and fine-tuning models to align closely with organizational goals.

Conclusion

Organizations aiming to modernize supply chain operations are increasingly adopting AI-driven analytics to gain deeper insights and make faster decisions. To support this transformation, teams must integrate ERP data—which is often siloed and complex—with advanced AI techniques such as Retrieval-Augmented Generation (RAG) and Graph Retrieval-Augmented Generation (GraphRAG). RAG enables flexible analysis of unstructured data, including textual records and historical documents. In contrast, GraphRAG provides a more structured approach by modeling relationships between entities, which makes it especially effective for analyzing complex supplier-to-customer networks.

Databricks provides a unified platform for implementing these approaches through its open Lakehouse, Unity Catalog, and Mosaic AI. These tools offer centralized governance, streamlined data access, and seamless development workflows. Teams can rapidly build, evaluate, and deploy AI agents that utilize either RAG or GraphRAG, shortening time to insight and improving operational efficiency.

The Databricks-SAP partnership strengthens this integration by connecting SAP ERP systems directly with the Databricks Lakehouse. This collaboration allows organizations to transform operational data into real-time, actionable intelligence and accelerate supply chain optimization.

Explore how the Databricks-SAP partnership can help your team unlock the full potential of AI-driven supply chain insights.

Databricks Community

How to Integrate Knowledge Graphs and Databricks Agents for AI-Powered Insights

AI-driven Supply Chains

RAG versus GraphRAG

Supply Chain Knowledge Graphs

Why Use Databricks to Build Agents

Unity Catalog Integration

How to Build a Supply Chain Agent

Agent Framework

AI Playground

Model Serving & Databricks Apps

Evaluation

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks