cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Vector search index initialization very slow

RodrigoE
Visitor

Hello,

I am creating a vector search index and selected Compute embeddings for a delta table with 19M records.  Delta table has only two  columns: ID (selected as index) and Name (selected for embedding). Embedding model is databricks-gte-large-en.

Index initialization estimated time to complete is 15 days!

I'm looking for advice to speed up initialization.

Thank you,

Rodrigo Escamilla

3 REPLIES 3

iyashk-DB
Databricks Employee
Databricks Employee

Hi Rodrigo,

The issue that you are seeing is because these embeddings are computed on the Databricks-GTE-Large-EN endpoint, which is a Pay-Per-Token endpoint. These have very high latency when used. So if speed is a concern, we suggest you use the models present in system.ai schema and create a Provisioned Throughput endpoint with a larger number of Model Units to have higher throughput and faster computations of embeddings. Then use that endpiont for computing the embeddings.

RodrigoE
Visitor

Thank you very much for the fast response.  I am new to databricks (and vector search).  How do I go about "use the models present in system.ai schema and create a Provisioned Throughput endpoint with a larger number of Model Units"

Thank you,

Rodrigo Escamilla

iyashk-DB
Databricks Employee
Databricks Employee

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now