cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Content Understanding Equivalent

ndw
New Contributor II

Hi all,

I am exploring Databricks services or components that could be considered equivalent to Azure Document Intelligence and Azure Content Understanding.

Our customer works with dozens of Excel and PDF files. These files follow multiple template types, and the formats may evolve over time. For example, some files contain data in a standard tabular structure, others use pivot-style Excel layouts, and some follow more complex or semi-structured formats.

We already have a Databricks license. Instead of relying on Azure Content Understanding, we would like to understand whether Databricks can be used to automatically infer file structures and extract the required values.

As an example, if “England” appears on the row axis and “20251205” appears as a column header in a pivot table, we would like to normalize this into a record such as:
20251205, England, sales_amount = 500,000 GBP.

I have also attached sample Excel templates, which represent several of the formats we receive. If we extract text from these Excel files and invoke the Databricks ai_parse_document function, I am not confident that the contextual meaning will be preserved. For instance, Column B represents the laboratory method used for experiments; however, this information is not explicitly labeled or defined within the Excel structure itself.

In addition, the ai_parse_document function does not support multiple languages.

I have reviewed other Databricks capabilities such as ai_query, ai_extract, and AgentBricks, but I am still uncertain which solution or combination of technologies would be the most appropriate fit for this use case.

Could you please advise how this requirement could be implemented using Databricks services or components?

Best regards,

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now