I am using Azure Document Intelligence to get data from a table in a PDF file. The table's headers do not visually align with the values. Therefore, the standard and pre-built models cannot correctly read the data.
I have built a custom-trained Azure Document Intelligence model and can read the data perfectly. When I trained the model, I used the Azure Document Intelligence feature and first ran a layout scan of the PDF file. Then, I created a new table type field and manually labelled and aligned each value detected on the PDF to one cell in the table field. After adding 4 PDF files, I could train a reasonably good model.
I want to know whether I can do the same/similar thing on Databricks using only Databricks's features? Not using Azure Document Intelligence.