cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Mosaic Vector Search

nagND
New Contributor II

I created a RAG using a corpus of pdf which I have on ADLS. Now where will the chunked text and vector embedding be stored once I parse all the PDFs and I want to start retrieval?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

LauJohansson
Contributor

Option 1: Delta Sync Index with embeddings computed by Databricks You provide a source Delta table that contains data in text format. Databricks calculates the embeddings, using a model that you specify, and optionally saves the embeddings to a table in Unity Catalog. As the Delta table is updated, the index stays synced with the Delta table.

The following diagram illustrates the process:

  1. Calculate query embeddings. Query can include metadata filters.

  2. Perform similarity search to identify most relevant documents.

  3. Return the most relevant documents and append them to the query.

LauJohansson_3-1729240335633.png

 


"
Mosaic AI Vector Search is a vector database that is built into the Databricks Data Intelligence Platform and integrated with its governance and productivity tools. A vector database is a database that is optimized to store and retrieve embeddings. Embeddings are mathematical representations of the semantic content of data, typically text or image data. Embeddings are generated by a large language model and are a key component of many GenAI applications that depend on finding documents or images that are similar to each other. Examples are RAG systems, recommender systems, and image and video recognition.

With Mosaic AI Vector Search, you create a vector search index from a Delta table. The index includes embedded data with metadata. You can then query the index using a REST API to identify the most similar vectors and return the associated documents. You can structure the index to automatically sync when the underlying Delta table is updated."



Have you read these docs?
https://docs.databricks.com/en/generative-ai/vector-search.html
https://docs.databricks.com/en/generative-ai/create-query-vector-search.html


View solution in original post

1 REPLY 1

LauJohansson
Contributor

Option 1: Delta Sync Index with embeddings computed by Databricks You provide a source Delta table that contains data in text format. Databricks calculates the embeddings, using a model that you specify, and optionally saves the embeddings to a table in Unity Catalog. As the Delta table is updated, the index stays synced with the Delta table.

The following diagram illustrates the process:

  1. Calculate query embeddings. Query can include metadata filters.

  2. Perform similarity search to identify most relevant documents.

  3. Return the most relevant documents and append them to the query.

LauJohansson_3-1729240335633.png

 


"
Mosaic AI Vector Search is a vector database that is built into the Databricks Data Intelligence Platform and integrated with its governance and productivity tools. A vector database is a database that is optimized to store and retrieve embeddings. Embeddings are mathematical representations of the semantic content of data, typically text or image data. Embeddings are generated by a large language model and are a key component of many GenAI applications that depend on finding documents or images that are similar to each other. Examples are RAG systems, recommender systems, and image and video recognition.

With Mosaic AI Vector Search, you create a vector search index from a Delta table. The index includes embedded data with metadata. You can then query the index using a REST API to identify the most similar vectors and return the associated documents. You can structure the index to automatically sync when the underlying Delta table is updated."



Have you read these docs?
https://docs.databricks.com/en/generative-ai/vector-search.html
https://docs.databricks.com/en/generative-ai/create-query-vector-search.html


Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group