- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2025 07:49 AM
I need help writing a filter. I want to pre-filter a vector index before performing a hybrid search and create this as a function. Below is a simple example of searching for products for a given customer. A prefilter is key as this provides authorizations for searching a vector index before applying the top k which reduces the vector space searching as a prefilter before the search. I am not seeing any filter capability like how you would call the API.
Example of the search API with prefilter
results = index.similarity_search(
query_text=question,
query_type = "HYBRID",
columns=["content", "product", "product_description", "product_id", "purchase_date"],
filters="{"customer_emai":customer_email},
num_results=5
)
Below is the SQL Function I need help on
product_description STRING,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2025 10:58 AM
Based on this documentation, it says, it indicate sql function VECTOR_SEARCH cannot apply pre filter which prefilter is a fundamental capability for vector search. Just very surprised this is not supported.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-22-2025 01:33 AM
@the_peterlandis, Yes, currently vector_search SQL function doesn't provide pre filter support. However, if you must implement the UC function for this, you can do it something like below using Python code with filters.
%sql
CREATE OR REPLACE FUNCTION kaushal.kaushal.vector_similarity_search(
query_text STRING,
filter_id INT,
num_results INT
)
RETURNS STRING
LANGUAGE PYTHON
COMMENT "Vector similarity search using authenticated client"
ENVIRONMENT (
dependencies = '["databricks-vectorsearch", "databricks-sdk"]',
environment_version = 'None'
)
AS $$
import json
import os
# Get credentials from Databricks secrets
# You'll need to set these up first
def get_secret(scope, key):
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
return w.secrets.get_secret(scope=scope, key=key).value
# Alternative: if dbutils is available in UDF context
# token = dbutils.secrets.get(scope="your-scope", key="databricks-token")
# host = dbutils.secrets.get(scope="your-scope", key="databricks-host")
# Set environment variables for authentication
os.environ['DATABRICKS_HOST'] = get_secret("your-scope", "databricks-host")
os.environ['DATABRICKS_TOKEN'] = get_secret("your-scope", "databricks-token")
from databricks.vector_search.client import VectorSearchClient
# Initialize client - should now pick up environment variables
client = VectorSearchClient()
index = client.get_index(
endpoint_name="vector-search-demo-endpoint-kaushal",
index_name="kaushal.kaushal.my_text_data_index"
)
results = index.similarity_search(
query_text=query_text,
columns=["id", "content"],
filters={"id": [filter_id]},
num_results=num_results
)
return json.dumps(results.get('result', {}).get('data_array', []))
$$;
Then run your UC function with SQL, and you should get the expected results.