cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Document Intelligence

js54123875
New Contributor III

Azure AI Document Intelligence | Microsoft Azure

Does anyone have experience ingesting outputs from Azure Document Intelligence and/or know of some guides on how best to ingest this data? Specifically we are looking to ingest tax form data that has been processed by Document Intelligence, but open to any patterns/examples.

Information that would be helpful:

  • Example code sets
  • How the data was modeled after ingestion
  • How to use the model id to determine if a schema has changed and how to handle that in the ingestion pipeline
  • etc.

Thanks!

3 REPLIES 3

florence023
New Contributor III

@js54123875 wrote:

Azure AI Document Intelligence | Microsoft Azure

Does anyone have experience ingesting outputs from Azure Document Intelligence and/or know of some guides on how best to ingest this data? Specifically we are looking to ingest tax form data that has been processed by Document Intelligence, but open to any patterns/examples. GMSocrates

Information that would be helpful:

  • Example code sets
  • How the data was modeled after ingestion
  • How to use the model id to determine if a schema has changed and how to handle that in the ingestion pipeline
  • etc.

Thanks!


Hello,

Hi there!

Ingesting outputs from Azure Document Intelligence, especially for tax form data, can be streamlined with the right approach. Here are some resources and tips to help you get started:

Example Code Sets
Azure Document Intelligence provides SDKs in various languages, including C#, Python, Java, and JavaScript. Here’s a basic example in Python to extract data from a tax form:

from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

endpoint = "YOUR_FORM_RECOGNIZER_ENDPOINT"
key = "YOUR_FORM_RECOGNIZER_KEY"

document_analysis_client = DocumentAnalysisClient(endpoint=endpoint, credential=AzureKeyCredential(key))

with open("path/to/your/taxform.pdf", "rb") as f:
    poller = document_analysis_client.begin_analyze_document("prebuilt-tax.us.1040", document=f)
    result = poller.result()

for document in result.documents:
    for name, field in document.fields.items():
        print(f"{name}: {field.value}")

Data Modeling After Ingestion
Once the data is extracted, you can model it in a structured format such as JSON or a relational database. For example, you might create tables for different tax forms (e.g., W-2, 1099) with columns representing the extracted fields.

Handling Schema Changes
To handle schema changes, you can use the model ID to check for updates. Azure Document Intelligence provides versioning for its models, so you can compare the current model ID with the previous one to detect changes.

Here’s a conceptual approach:

Store the Model ID: Save the model ID used for each document processing.
Check for Updates: Periodically check if the model ID has changed.
Update Schema: If a change is detected, update your ingestion pipeline to accommodate the new schema.

Hope this will help you.
Best regards,
florence023

Ajay-Pandey
Esteemed Contributor III

Thanks for sharing

Ajay Kumar Pandey

Retired_mod
Esteemed Contributor III

Hi @js54123875, Thanks for reaching out! Please review the response and let us know if it answers your question. Your feedback is valuable to us and the community.

If the response resolves your issue, kindly mark it as the accepted solution. This will help close the thread and assist others with similar queries.

We appreciate your participation and are here if you need further assistance!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group