cancel
Showing results for 
Search instead for 
Did you mean: 
Announcements
Stay up-to-date with the latest announcements from Databricks. Learn about product updates, new features, and important news that impact your data analytics workflow.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why Your Agents Can’t Read Enterprise Documents — and How to Fix It

Tushar_Parekar
Databricks Employee
Databricks Employee

We’re announcing Document Intelligence, designed to tackle the biggest weak spot in today’s agents: reading real enterprise documents. It turns messy PDFs like contracts, claims, invoices, financial filings, and clinical notes into reliable structured data your agents can actually use, at enterprise scale.

Document Intelligence is built on three core pillars: Research-backed Accuracy, Enterprise Scale, and End-to-end Simplicity.

Key highlights

  • Agents struggle to read real documents: Even frontier agents score below 50% on tasks built from real enterprise PDFs, because they misread what’s on the page, not because they can’t reason.
  • Document processing is the accuracy ceiling: If extraction from contracts, claims, invoices, and other PDFs is wrong, every downstream agent decision is at risk.
  • Research-backed improvements: On real treasury bond and OfficeQA-style documents, pre-processing with ai_parse_document delivers measurable accuracy gains for multiple agent frameworks.
  • Chainable AI Functions: ai_parse_document (GA), ai_classify, and ai_extract work together as a pipeline so you can parse once, then classify, extract, and re-extract without reprocessing documents.
  • Enterprise scale and cost efficiency: In benchmarks and customer deployments, Document Intelligence achieves high extraction accuracy at 5-7x lower cost, with some workloads seeing nearly 90% lower cost while processing hundreds of millions of clinical notes.
  • Runs natively on Databricks: Ingest with Lakeflow Connect, orchestrate with Lakeflow Jobs or Spark Declarative Pipelines, govern with Unity Catalog, and power agents with Agent Bricks on top of the same enriched document data.

 In the full post, you’ll see OfficeQA results, real customer examples, and how simple SQL with ai_parse_document, ai_classify, and ai_extract fit together to power document-heavy agent workflows on Databricks.

👀 Read the full post here 👈

0 REPLIES 0