cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Agent Bricks Information Extraction

Yash01Kumar12
Visitor

I am facing some problem in Information extraction using PDF. I have done all the necessary steps. 

1) I loaded the data in Volume.
2) I ran the Use PDF's functionality to create a structure table of the PDFs

3) I now have the table with the column names: 

path -> string
raw_parsed -> variant
text string -> 
error_status -> string

I am experiencing a problem while directly creating a Vector Search Index from this:


Yash01Kumar12_0-1760117409670.png

Any reason why this is happening. 

Second, facing problems in creating Information Extraction Agent, the error I am facing is

Yash01Kumar12_1-1760117530018.png

Requesting you to help me in understand what wrong I am doing throughout this process.


1) Databricks workspace region: East US 2 (Azure) it is also correct s per documentation

2) Able to use ai_query()

 

2 REPLIES 2

NandiniN
Databricks Employee
Databricks Employee

Hi @Yash01Kumar12 

1. For the Index creation failure: Invalid column type - variant is used for 'raw_parsed'. 

The indexing mechanism, does not support the VARIANT data type for columns that need to be indexed. It only supports specific types like various numeric types, strings, timestamps, and arrays of numeric types (for vector embeddings). 

While Variant data type is Public Preview, I see an internal feature request "DB-I-14338" to support Variant data type for Vector search. So, I suspect you would have to change the data type to proceed right now. I have added your usecase as well and a proxy vote.

In delta table, you would have to use the below code to support it. But I do not think this would still help as it has to be supported in Vector Search.

ALTER TABLE table_name SET TBLPROPERTIES('delta.feature.variantType-preview' = 'supported')

So, for now, can you use a different data type for the column which is supported today?

Thanks & Regards,

Nandini

NandiniN
Databricks Employee
Databricks Employee

For the issue 2:

INVALID_PARAMETER_VALUE: Couldn't find enough valid rows in the selected table. Found 0 rows, minimum required is 1 for agent creation.

Since the previous step was failing due to a data type issue (VARIANT), it's highly likely that the entire table is being marked as invalid or unusable by the Agent framework's sampling mechanism.

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now