Can I Replicate Azure Document Intelligence's Custom Table Extraction in Databricks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2025 08:07 PM
I am using Azure Document Intelligence to get data from a table in a PDF file. The table's headers do not visually align with the values. Therefore, the standard and pre-built models cannot correctly read the data.
I have built a custom-trained Azure Document Intelligence model and can read the data perfectly. When I trained the model, I used the Azure Document Intelligence feature and first ran a layout scan of the PDF file. Then, I created a new table type field and manually labelled and aligned each value detected on the PDF to one cell in the table field. After adding 4 PDF files, I could train a reasonably good model.
I want to know whether I can do the same/similar thing on Databricks using only Databricks's features? Not using Azure Document Intelligence.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-22-2025 04:30 PM
Hi @AlbertWang, you can easily achieve this using AgenBricks - Information Extraction. Your PDFs will be converted to text using the ai_parse_document function and saved in a Databricks table. You can then create the agent using that text table to get the output in JSON format.