<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Why Your Agents Can’t Read Enterprise Documents — and How to Fix It in Announcements</title>
    <link>https://community.databricks.com/t5/announcements/why-your-agents-can-t-read-enterprise-documents-and-how-to-fix/m-p/154935#M746</link>
    <description>&lt;P&gt;&lt;SPAN&gt;We’re announcing &lt;/SPAN&gt;&lt;STRONG&gt;Document Intelligence&lt;/STRONG&gt;&lt;SPAN&gt;, designed to tackle the biggest weak spot in today’s agents: reading real enterprise documents. It turns messy PDFs like contracts, claims, invoices, financial filings, and clinical notes into reliable structured data your agents can actually use, at enterprise scale. &lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Document Intelligence is built on three core pillars: &lt;/SPAN&gt;&lt;STRONG&gt;Research-backed Accuracy&lt;/STRONG&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;STRONG&gt;Enterprise Scale&lt;/STRONG&gt;&lt;SPAN&gt;, and &lt;/SPAN&gt;&lt;STRONG&gt;End-to-end Simplicity&lt;/STRONG&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Key highlights&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Agents struggle to read real documents:&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;Even frontier agents score below 50% on tasks built from real enterprise PDFs, because they misread what’s on the page, not because they can’t reason.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Document processing is the accuracy ceiling:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;If extraction from contracts, claims, invoices, and other PDFs is wrong, every downstream agent decision is at risk.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Research-backed improvements:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;On real treasury bond and OfficeQA-style documents, pre-processing with &lt;/SPAN&gt;&lt;STRONG&gt;ai_parse_document&lt;/STRONG&gt;&lt;SPAN&gt; delivers measurable accuracy gains for multiple agent frameworks.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Chainable AI Functions:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;ai_parse_document (GA)&lt;/STRONG&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;STRONG&gt;ai_classify&lt;/STRONG&gt;&lt;SPAN&gt;, and &lt;/SPAN&gt;&lt;STRONG&gt;ai_extract&lt;/STRONG&gt;&lt;SPAN&gt; work together as a pipeline so you can parse once, then classify, extract, and re-extract without reprocessing documents.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Enterprise scale and cost efficiency:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;In benchmarks and customer deployments, Document Intelligence achieves high extraction accuracy at 5-7x lower cost, with some workloads seeing nearly 90% lower cost while processing hundreds of millions of clinical notes.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Runs natively on Databricks:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;Ingest with Lakeflow Connect, orchestrate with Lakeflow Jobs or Spark Declarative Pipelines, govern with Unity Catalog, and power agents with Agent Bricks on top of the same enriched document data.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;In the full post, you’ll see OfficeQA results, real customer examples, and how simple SQL with &lt;/SPAN&gt;&lt;STRONG&gt;ai_parse_document&lt;/STRONG&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;STRONG&gt;ai_classify&lt;/STRONG&gt;&lt;SPAN&gt;, and &lt;/SPAN&gt;&lt;STRONG&gt;ai_extract&lt;/STRONG&gt;&lt;SPAN&gt; fit together to power document-heavy agent workflows on Databricks.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;A style="background-color: #ff3621; color: white; padding: 10px 20px; text-decoration: none; border-radius: 5px; font-weight: bold; display: inline-block;" href="https://www.databricks.com/blog/why-frontier-agents-cant-read-documents-and-how-were-fixing-it?utm_source=bambu&amp;amp;utm_medium=social&amp;amp;utm_campaign=advocacy" target="_blank" rel="noopener"&gt;&lt;span class="lia-unicode-emoji" title=":eyes:"&gt;👀&lt;/span&gt; Read the full post here&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":backhand_index_pointing_left:"&gt;👈&lt;/span&gt;&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 20 Apr 2026 11:56:04 GMT</pubDate>
    <dc:creator>Tushar_Parekar</dc:creator>
    <dc:date>2026-04-20T11:56:04Z</dc:date>
    <item>
      <title>Why Your Agents Can’t Read Enterprise Documents — and How to Fix It</title>
      <link>https://community.databricks.com/t5/announcements/why-your-agents-can-t-read-enterprise-documents-and-how-to-fix/m-p/154935#M746</link>
      <description>&lt;P&gt;&lt;SPAN&gt;We’re announcing &lt;/SPAN&gt;&lt;STRONG&gt;Document Intelligence&lt;/STRONG&gt;&lt;SPAN&gt;, designed to tackle the biggest weak spot in today’s agents: reading real enterprise documents. It turns messy PDFs like contracts, claims, invoices, financial filings, and clinical notes into reliable structured data your agents can actually use, at enterprise scale. &lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Document Intelligence is built on three core pillars: &lt;/SPAN&gt;&lt;STRONG&gt;Research-backed Accuracy&lt;/STRONG&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;STRONG&gt;Enterprise Scale&lt;/STRONG&gt;&lt;SPAN&gt;, and &lt;/SPAN&gt;&lt;STRONG&gt;End-to-end Simplicity&lt;/STRONG&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Key highlights&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Agents struggle to read real documents:&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;Even frontier agents score below 50% on tasks built from real enterprise PDFs, because they misread what’s on the page, not because they can’t reason.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Document processing is the accuracy ceiling:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;If extraction from contracts, claims, invoices, and other PDFs is wrong, every downstream agent decision is at risk.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Research-backed improvements:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;On real treasury bond and OfficeQA-style documents, pre-processing with &lt;/SPAN&gt;&lt;STRONG&gt;ai_parse_document&lt;/STRONG&gt;&lt;SPAN&gt; delivers measurable accuracy gains for multiple agent frameworks.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Chainable AI Functions:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;ai_parse_document (GA)&lt;/STRONG&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;STRONG&gt;ai_classify&lt;/STRONG&gt;&lt;SPAN&gt;, and &lt;/SPAN&gt;&lt;STRONG&gt;ai_extract&lt;/STRONG&gt;&lt;SPAN&gt; work together as a pipeline so you can parse once, then classify, extract, and re-extract without reprocessing documents.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Enterprise scale and cost efficiency:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;In benchmarks and customer deployments, Document Intelligence achieves high extraction accuracy at 5-7x lower cost, with some workloads seeing nearly 90% lower cost while processing hundreds of millions of clinical notes.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;STRONG&gt;Runs natively on Databricks:&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;Ingest with Lakeflow Connect, orchestrate with Lakeflow Jobs or Spark Declarative Pipelines, govern with Unity Catalog, and power agents with Agent Bricks on top of the same enriched document data.&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;In the full post, you’ll see OfficeQA results, real customer examples, and how simple SQL with &lt;/SPAN&gt;&lt;STRONG&gt;ai_parse_document&lt;/STRONG&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;STRONG&gt;ai_classify&lt;/STRONG&gt;&lt;SPAN&gt;, and &lt;/SPAN&gt;&lt;STRONG&gt;ai_extract&lt;/STRONG&gt;&lt;SPAN&gt; fit together to power document-heavy agent workflows on Databricks.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class="p8i6j01 paragraph"&gt;&lt;A style="background-color: #ff3621; color: white; padding: 10px 20px; text-decoration: none; border-radius: 5px; font-weight: bold; display: inline-block;" href="https://www.databricks.com/blog/why-frontier-agents-cant-read-documents-and-how-were-fixing-it?utm_source=bambu&amp;amp;utm_medium=social&amp;amp;utm_campaign=advocacy" target="_blank" rel="noopener"&gt;&lt;span class="lia-unicode-emoji" title=":eyes:"&gt;👀&lt;/span&gt; Read the full post here&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":backhand_index_pointing_left:"&gt;👈&lt;/span&gt;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 11:56:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/announcements/why-your-agents-can-t-read-enterprise-documents-and-how-to-fix/m-p/154935#M746</guid>
      <dc:creator>Tushar_Parekar</dc:creator>
      <dc:date>2026-04-20T11:56:04Z</dc:date>
    </item>
  </channel>
</rss>

