<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Azure Document Intelligence in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82887#M36766</link>
    <description>&lt;P&gt;&lt;A href="https://azure.microsoft.com/en-us/products/ai-services/ai-document-intelligence" target="_blank"&gt;Azure AI Document Intelligence | Microsoft Azure&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Does anyone have experience ingesting outputs from Azure Document Intelligence and/or know of some guides on how best to ingest this data? Specifically we are looking to ingest tax form data that has been processed by Document Intelligence, but open to any patterns/examples.&lt;/P&gt;&lt;P&gt;Information that would be helpful:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Example code sets&lt;/LI&gt;&lt;LI&gt;How the data was modeled after ingestion&lt;/LI&gt;&lt;LI&gt;How to use the model id to determine if a schema has changed and how to handle that in the ingestion pipeline&lt;/LI&gt;&lt;LI&gt;etc.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Tue, 13 Aug 2024 15:30:30 GMT</pubDate>
    <dc:creator>js54123875</dc:creator>
    <dc:date>2024-08-13T15:30:30Z</dc:date>
    <item>
      <title>Azure Document Intelligence</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82887#M36766</link>
      <description>&lt;P&gt;&lt;A href="https://azure.microsoft.com/en-us/products/ai-services/ai-document-intelligence" target="_blank"&gt;Azure AI Document Intelligence | Microsoft Azure&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Does anyone have experience ingesting outputs from Azure Document Intelligence and/or know of some guides on how best to ingest this data? Specifically we are looking to ingest tax form data that has been processed by Document Intelligence, but open to any patterns/examples.&lt;/P&gt;&lt;P&gt;Information that would be helpful:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Example code sets&lt;/LI&gt;&lt;LI&gt;How the data was modeled after ingestion&lt;/LI&gt;&lt;LI&gt;How to use the model id to determine if a schema has changed and how to handle that in the ingestion pipeline&lt;/LI&gt;&lt;LI&gt;etc.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Tue, 13 Aug 2024 15:30:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82887#M36766</guid>
      <dc:creator>js54123875</dc:creator>
      <dc:date>2024-08-13T15:30:30Z</dc:date>
    </item>
    <item>
      <title>Re: Azure Document Intelligence</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82943#M36787</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/72142"&gt;@js54123875&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;P&gt;&lt;A href="https://azure.microsoft.com/en-us/products/ai-services/ai-document-intelligence" target="_blank" rel="noopener"&gt;Azure AI Document Intelligence | Microsoft Azure&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Does anyone have experience ingesting outputs from Azure Document Intelligence and/or know of some guides on how best to ingest this data? Specifically we are looking to ingest tax form data that has been processed by Document Intelligence, but open to any patterns/examples.&amp;nbsp;&lt;A href="https://www.gm-socrates.com" target="_blank" rel="noopener"&gt;&lt;FONT size="1 2 3 4 5 6 7" color="#FFFFFF"&gt;GMSocrates&lt;/FONT&gt;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Information that would be helpful:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Example code sets&lt;/LI&gt;&lt;LI&gt;How the data was modeled after ingestion&lt;/LI&gt;&lt;LI&gt;How to use the model id to determine if a schema has changed and how to handle that in the ingestion pipeline&lt;/LI&gt;&lt;LI&gt;etc.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Hi there!&lt;/P&gt;&lt;P&gt;Ingesting outputs from Azure Document Intelligence, especially for tax form data, can be streamlined with the right approach. Here are some resources and tips to help you get started:&lt;/P&gt;&lt;P&gt;Example Code Sets&lt;BR /&gt;Azure Document Intelligence provides SDKs in various languages, including C#, Python, Java, and JavaScript. Here’s a basic example in Python to extract data from a tax form:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

endpoint = "YOUR_FORM_RECOGNIZER_ENDPOINT"
key = "YOUR_FORM_RECOGNIZER_KEY"

document_analysis_client = DocumentAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key))

with open("path/to/your/taxform.pdf", "rb") as f:
    poller = document_analysis_client.begin_analyze_document("prebuilt-tax.us.1040", document=f)
    result = poller.result()

for document in result.documents:
    for name, field in document.fields.items():
        print(f"{name}: {field.value}")&lt;/LI-CODE&gt;&lt;P&gt;Data Modeling After Ingestion&lt;BR /&gt;Once the data is extracted, you can model it in a structured format such as JSON or a relational database. For example, you might create tables for different tax forms (e.g., W-2, 1099) with columns representing the extracted fields.&lt;/P&gt;&lt;P&gt;Handling Schema Changes&lt;BR /&gt;To handle schema changes, you can use the model ID to check for updates. Azure Document Intelligence provides versioning for its models, so you can compare the current model ID with the previous one to detect changes.&lt;/P&gt;&lt;P&gt;Here’s a conceptual approach:&lt;/P&gt;&lt;P&gt;Store the Model ID: Save the model ID used for each document processing.&lt;BR /&gt;Check for Updates: Periodically check if the model ID has changed.&lt;BR /&gt;Update Schema: If a change is detected, update your ingestion pipeline to accommodate the new schema.&lt;/P&gt;&lt;P&gt;Hope this will help you.&lt;BR /&gt;Best regards,&lt;BR /&gt;florence023&lt;/P&gt;</description>
      <pubDate>Wed, 14 Aug 2024 10:02:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82943#M36787</guid>
      <dc:creator>florence023</dc:creator>
      <dc:date>2024-08-14T10:02:14Z</dc:date>
    </item>
    <item>
      <title>Re: Azure Document Intelligence</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82946#M36788</link>
      <description>&lt;P&gt;Thanks for sharing&lt;/P&gt;</description>
      <pubDate>Wed, 14 Aug 2024 10:22:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82946#M36788</guid>
      <dc:creator>Ajay-Pandey</dc:creator>
      <dc:date>2024-08-14T10:22:26Z</dc:date>
    </item>
    <item>
      <title>Re: Azure Document Intelligence</title>
      <link>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82965#M36794</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/72142"&gt;@js54123875&lt;/a&gt;, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community.&lt;/P&gt;
&lt;P&gt;If the response resolves your issue, kindly mark it as the accepted solution. This will help close the thread and assist others with similar queries.&lt;/P&gt;
&lt;P&gt;We appreciate your participation and are here if you need further assistance!&lt;/P&gt;</description>
      <pubDate>Wed, 14 Aug 2024 11:49:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/azure-document-intelligence/m-p/82965#M36794</guid>
      <dc:creator>Retired_mod</dc:creator>
      <dc:date>2024-08-14T11:49:01Z</dc:date>
    </item>
  </channel>
</rss>

