How to perform combined search on structured and unstructured data in databrick using RAG or other
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ11-28-2024 04:32 AM
I created a RAG application in databricks which performs the following steps:
1. Extract text from PDF files
2. Prepare embeddings on extracted text and create vector search index
3. Create a LLM model and served the model which can answer question based on pdf data.
I followed the below URL mainly to develop the same:
https://notebooks.databricks.com/demos/llm-rag-chatbot/index.html#
Now I have some structured data in the delta tables as well. I want to perform the combined search on the pdf extracted data and the structured tables data.
I know that we can create vector search index on the structured table and use it for searching. But the problem with this approach is that I need to create a separate vector search for each table and also in vector search, the embeddings are created only on 1 column and that is used for searching via embeddings. How I can use all the columns from multiple structed tables and performed combined search via LLMs in databricks?
I also tried using Genie in databricks. I can perform search only on structured data and not on the text extracted form pdf files, so I could not use it.
I am open to use any other options also which can work in databricks.
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/97567C72181EBE789E1F0FD869E4C89B/responsive_peak/images/icon_anonymous_message.png)