Databricks Vector Search is a vector database that is built into the Databricks Data Intelligence Platform and integrated with its governance and productivity tools. A vector database is optimized to store and retrieve embeddings. Embeddings are mathematical representations of the semantic content of data, typically text or image data. Embeddings are generated by a Large Language Model (LLM) and are a key component of many GenAI applications that depend on finding documents or images that are similar to each other. Examples are Retrieval Augmented Generation (RAG) systems, recommender systems, and image and video recognition.
For more details on how to create a Vector Search Index, how it works and how the similarity search works in Vector Search Index you can refer to here.
There are two types of Vector Search Indexes on Databricks platform -
We recommend that you use Delta Sync Index if your use case supports it. Delta Sync Index provides easy to use automatic and managed ETL pipeline to keep Vector Index up to date and you can use Databricks managed embeddings computation. However, if you need more flexibility to perform CRUD (Create, read, update and delete) operations of your data in your Vector Index, or you already have done embedding computation using your self-managed embedding LLM and its ready to be ingested into the Databricks Vector Index, you would want to use Direct Vector Access Index.
Unlike Delta Sync Index, which uses managed DLT to automatically and incrementally update the index as the underlying data in the Delta Table changes, Direct Vector Access Index does not have a built in syncing process. You have to build and manage your own pipeline to keep the data fresh in your Direct Vector Access Index. And we know Data Freshness is imperative for any trustworthy database and Vector Index is no exception. In this blog I will build a ETL pipeline to keep the data fresh in Near Real Time for the Direct Vector Access Index as new documents are ingested from source. I will also use this Direct Vector Access Index to deploy a real-time Q&A chatbot using Databricks retrieval augmented generation (RAG) and serverless capabilities, leveraging the DBRX Instruct Foundation Model for smart responses against Databricks documentations (ingested as PDF). If you want to learn more about RAG and how you can build it in Databricks Data Intelligent Platform you can refer to dbdemos.
You provide a source Delta table that contains pre-calculated embeddings. There is no automatic syncing when the Delta table is updated. You must manually update the index using the REST API when the embeddings table changes.
The following diagram illustrates the process,
The diagram at the end of this section below shows the highlights of the ETL pipeline which is triggered every time there is a new file dropped into the Unity Catalog Volume. Databricks Auto Loader is used to ingest new data in the landing table. Latest records are merged into the staging table from the landing table. Staging table data is then chunked and converted into embeddings (using BGE Large (English) Foundation Model in Databricks) before being inserted into the Vector Index. In case there is an update to an existing document, all the chunks of the existing documents are deleted from the Vector Index before inserting all the newly arrived records into the Vector Index. This is done to ensure that all the data from an existing document is completely refreshed by the newly arrived version of the same document.
Notebooks available in this git repository illustrates the ETL process described in this section.
Retrieval Augmented Generation (RAG) is a generative AI design pattern that involves combining a Large Language Model (LLM) with external knowledge retrieval. RAG is required to connect real-time data to your generative AI applications. Doing so improves the accuracy and quality of the application, by providing your data as context to the LLM at inference time.
In the previous section we have designed a mechanism to keep data fresh in near real time for the Vector Index. Let us use that to create a Q&A chatbot using RAG and Databricks serverless capabilities, leveraging the DBRX Instruct Foundation Model for smart responses against Databricks documentations.
Here are the high level steps for building the Q&A chatbot application using RAG -
The architecture of this solution can be seen in the following diagram. If you want the code to build this yourself, see the git repository with notebooks that walk through all of these steps. Description of high level steps used in the notebooks are also provided in the Appendix section below.
In this blog we saw how we can build an ETL pipeline to keep data fresh in a Databricks Direct Vector Access Index. We also saw how we can quickly build a Q&A chatbot using Databricks retrieval augmented generation (RAG) and serverless capabilities, leveraging the DBRX Instruct Foundation Model for smart responses against Databricks documentations.
Use Databricks Vector Index and Foundation Model APIs to build your own Q&A chatbot employing Retrieval Augmented Generation (RAG) application architecture.
High level steps for each notebook are given below. Notebooks are available in this git repository.
01-Direct Vector Access Index
This notebook sets up:
02-Deploy-RAG-Chatbot-Model
This notebook:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.