Iām working on using Databricks vector search combined with full-text search in my application. I want to filter queries by the id field in my vector search index. I noticed that there is a limit of 1,024 IDs per query when using filters.
If I need to filter on more than 1,024 IDs, my current idea is to run multiple queries in batches and then combine the results.
My questions are:
Is this batching approach reasonable for large filters?
Can I rely on the ANN + HNSW algorithm to return consistent similarity scores for the same query vector, regardless of which other IDs are included in the filter? Or could the results vary depending on the set of IDs passed in each query?
Thanks in advance for any insights!