cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Vector Index Creation for external embedding model takes a lot of time

rjain
New Contributor

I have embedding model endpoint created and served. It is huggingface model which databricks doesnt provide. I am using this model to create vector search index however this takes a lot of time to get created. I observed that when I use databricks offered embedding(large_bge_en) model it takes only seconds. Any suggestion what could be going wrong in my case?

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

The main reason your Hugging Face embedding model endpoint is taking much longer than Databricksโ€™ own large_bge_en model to build a vector search index is likely due to differences in operational architecture and performance optimizations between external custom endpoints and native Databricks-managed models.

Key Factors Impacting Index Creation Time

  • API/Network Overhead: Using an external model (even if Hugging Face-hosted) involves network latency for every embedding call, which adds significant overhead, especially for large-scale batch operations.โ€‹

  • Endpoint Scaling and Cold Starts: If your Hugging Face endpoint is set to scale to zero when idle, cold starts can add minutes to your first requests. Databricks managed models are optimized to avoid such cold start penalties.โ€‹

  • Batching and Throughput: Databricks models are tightly integrated and can leverage optimized hardware accelerators, efficient batching, and parallelization. Hugging Face endpoints may have lower throughput limits, especially on public or lightly provisioned infrastructure.โ€‹

  • Embedding Dimension Checks and Data Structure: Mismatches between the embedding size your model outputs and what the index expects can cause extra validation or conversion work, slowing the indexing pipeline.โ€‹

  • Serialization and Format: If your external endpoint returns embeddings in a different format or requires additional deserialization, this can also introduce latency compared to Databricksโ€™ direct-integration models.

Best Practices and Suggestions

  • Precompute Embeddings: Rather than calling the external endpoint live during indexing, precompute and store embeddings for your dataset, then build the index from this static data (self-managed embeddings). This is the fastest approach and is the method Databricks benchmarks rely on.โ€‹

  • Optimize Endpoint Provisioning: Ensure your Hugging Face endpoint has adequate resources and does not scale to zero. If possible, provision for high concurrency and throughput to reduce latency.

  • Batch Requests: If your endpoint supports batching, maximize batch sizes to reduce per-request overhead and make more efficient use of resources.โ€‹

  • Monitor and Benchmark: Regularly profile the performance of both embedding generation and index building. Look for bottlenecks in network, serialization, or dimension mismatches.โ€‹

  • Consider Edge Models or Hosting: When feasible, host the embedding model closer to your data, perhaps within Databricks itself, so you have greater control and minimize network latency.โ€‹

In summary, the main bottleneck is the extra latency introduced by the external Hugging Face endpoint, which is avoided by Databricksโ€™ optimized, tightly integrated offering. Moving to a precomputed/self-managed embedding workflow and tuning your endpoint can dramatically improve performance.โ€‹

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now