DAIS 2026 · Speaker Spotlight
A conversation
with Archika Dogra
On turning millions of messy enterprise documents into agent-ready data — and the intelligent document processing pipelines that make agents actually work in production.
The Session
Location
San Francisco + Virtual
The DAIS 2026 Speaker Spotlight is a series where we hand the mic to the speakers heading to Data + AI Summit and let them answer five short questions — in their own voice, no press-release polish.
Below, Archika Dogra on why the best agents are downstream of the best data — and the intelligent document processing primitives turning messy enterprise PDFs into structured, agent-ready context. Lightly edited for length — otherwise, the words are hers.
“
The best teams know the best agents are downstream of the best data.
— Archika Dogra
The topic
What is your talk about, and who is it for?
How to turn millions of messy enterprise documents into agent-ready data you can use to (1) automate document-heavy business processes and (2) power highly accurate, efficient agents.
Why this, why now
What's changed in the last 6–12 months that makes this topic urgent right now?
Every enterprise is automating document extraction workflows and shipping agents on top of document data. But these agentic workflows are only as smart as their ability to read the documents holding your enterprise context. Our research team's OfficeQA benchmark found even frontier agent harnesses score below 50% on real enterprise document tasks. With ai_parse_document on Databricks, accuracy improved by 16%. Clean, structured document data powers more accurate, reliable results.
The personal stake
Why are you the person giving this talk?
I lead Product for AI Functions: powerful, scalable building blocks for turning unstructured data into structured insights. We've shipped the intelligent document processing primitives (ai_parse_document, ai_extract, ai_classify) behind thousands of customer pipelines, and we've seen first-hand how customers extract insights from everything from millions of clinical notes to hundred-page financial filings.
What you'll leave with
What will someone be able to do on Monday morning that they couldn't do before?
You'll walk out knowing how to do two things. First, automate the document-heavy processes draining your org's time and resources — invoices, contracts, claims, filings — with intelligent document processing. Second, build pipelines that scale from thousands to millions of documents, so your agents can read and reason over every PDF in your enterprise.
The bigger picture
How does this fit into where Databricks — and data and AI more broadly — is heading?
The era of enterprise AI running on unstructured data is already here, and the best teams know the best agents are downstream of the best data. The teams figuring out how to turn raw documents into structured, governed, agent-ready data are the ones whose automations and agents actually work in production. That's the shift happening right now, and this session is built to help you get ahead of it.
A note from us
Speakers are the heart of DAIS, and helping the world hear your story is one of the best parts of our job.
Part of the DAIS 2026 Speaker Spotlight series — more voices dropping in the weeks ahead. Got a DAIS speaker you'd love to hear from next? Mention them in the comments — we're always listening.