cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Building Point-in-Time Correctness for LLM Agents on Databricks

Pavan_suresh
New Contributor II

A delivery truck arrives at a downtown Seattle coffee shop carrying 8,000 gallons of oat milk. The regional manager is furious. The AI agent that manages supply chain decisions made the call autonomously at 08:14 AM on a rainy 45F morning. Nobody ordered that much oat milk. Nobody asked for it. The manager calls Emma, the AI engineer, and says the words every AI team dreads: "Turn it off. It is hallucinating."

Emma does not turn it off. Instead, she opens her notebook.

This is the story of what she finds, and the architecture that made it possible.

The Black Box Problem

Modern LLM agents make decisions continuously. They read data, reason over it, and take actions. The problem is that most of the infrastructure they run on is built for the present. A standard database shows you what is true right now. It does not show you what was true at 08:14:23 AM on a specific Tuesday when a specific decision was made.

When something goes wrong, you are left with an application log that says:

 
08:14 AM - SupplyAgent triggered08:15 AM - Decision: Order 8000 gallons oat milk to Downtown08:15 AM - Reasoning: Anticipating unprecedented iced-latte demand due to extreme heat and inventory depletion

And a live database that shows 118 gallons of inventory and 45F rainy weather. The two realities do not match. Without a way to reconstruct the past, you cannot explain the present. You call it a hallucination and move on. The actual problem goes unfixed.

What Emma Actually Found

Using Delta Lake Time Travel, Emma reconstructs the exact state of the world at the moment SupplyAgent made its decision.

 
 
spark.sql("""    SELECT store, inventory_gallons, weather_temp_f, weather_condition    FROM workspace.supply_agent_demo.world_state    VERSION AS OF 1    WHERE store = 'Downtown'    ORDER BY event_timestamp DESC    LIMIT 1""")

The query returns:

 
Downtown | -999 | 102 | Extreme Heatwave

At 08:12 AM, a faulty IoT sensor in the downtown fridge reported inventory as -999 gallons, triggering an emergency restock flag. Simultaneously, a glitch in the third-party weather API pushed a 102F heatwave warning for Seattle. At 08:16 AM, both systems self-corrected and overwrote the bad data in the live table. The evidence was erased four minutes after the damage was done.

The LLM did not hallucinate. It made a completely logical decision based on the data it was given. The data pipeline was poisoned.

The Architecture

The project is built across four notebooks, orchestrated as a single Databricks Workflow.

Notebook 1 simulates the world state table, a Delta Lake table that stores IoT sensor and weather API readings over time. Three versions are written. Version 0 is the normal baseline. Version 1 is the corrupted window. Version 2 is the self-correction. Each write is tagged with userMetadata so the pipeline can resolve versions dynamically without any hardcoded integers.

Notebook 2 runs SupplyAgent. The agent reads the world state filtered by event timestamp at three simulated decision times, 07:30, 08:14, and 08:20. At each cycle it passes the store context to Llama 3.3 70B via Databricks Foundation Model APIs and writes the LLM-generated decision and reasoning to an agent memory table with two timestamps: action_timestamp, when the agent decided, and system_timestamp, when the record was committed to Delta. This is the bi-temporal model.

Notebook 3 is the investigation. It demonstrates three things. First, Delta Time Travel using VERSION AS OF to reconstruct exactly what the agent saw at 08:14. Second, a bi-temporal audit query joining agent memory with world state to show inventory_agent_saw=-999 against inventory_live_now=118 side by side. Third, OPTIMIZE with ZORDER to address the commit bloat that high-frequency agent writes introduce to the metadata layer.

Notebook 4 creates and publishes a Lakeview AI/BI Dashboard programmatically using the Databricks SDK. The dashboard renders the sensor timeline with the corrupted reading highlighted, the agent decision log with the LLM reasoning at each cycle, the bi-temporal audit table, and the full Delta commit history.

The entire pipeline runs end to end as a four-task Databricks Workflow with serverless compute on Free Edition.

The Bi-Temporal Insight

Standard databases track one timeline: when data was written. This is called transaction time. But data also has a second timeline: when the event actually occurred in the real world. This is called valid time or event time.

When a faulty sensor logs a bad reading at 08:12, but the correction arrives at 08:16, a standard database shows you the corrected state and nothing else. Both timelines are collapsed into one. The window of corruption is invisible.

Delta Lake preserves both. Every commit is immutable. You can query any past version. The agent memory table stores both action_timestamp and system_timestamp on every row. The gap between these two columns is exactly where the audit trail lives.

This is not just a debugging technique. As LLM agents take on more autonomous decision making in enterprise environments, the ability to answer the question "what did the agent know, and when did it know it" becomes a compliance and governance requirement. Delta Lake provides the infrastructure to answer that question. This project demonstrates how to build on top of it.

Connection to Lakebase

The core problem Lakebase addresses is exactly what this project demonstrates: agents need persistent, queryable, time-aware memory to be auditable and trustworthy. This project builds that memory layer from first principles using Delta Lake, Databricks Workflows, and Foundation Model APIs, all available on Free Edition today. 

What I Learned

The most important thing this project reinforced is that the reliability of an AI agent is only as good as the reliability of the data it reads. You can have the most capable LLM in the world, but if a sensor reports -999 gallons and a weather API reports 102F, the agent will make the wrong call. It will make it confidently. And without a temporal data architecture underneath it, you will never be able to prove what actually happened.

Delta Lake Time Travel is not just a recovery tool. It is an auditability primitive for the agentic era.

Resources

GitHub Repository: https://github.com/Pavan-249/supplychain-audit-databricks

The repository contains all five notebooks. notebook0 creates the Databricks Workflow. notebook1 simulates the world state. notebook2 runs the LLM agent. notebook3 performs the investigation. notebook4 creates the Lakeview dashboard.

Pavan_suresh_0-1779471800850.png

Pavan_suresh_1-1779471953351.png

Pavan_suresh_2-1779472602708.png

 

 

0 REPLIES 0