Re: Help with hybrid search function

the_peterlandis · ‎11-21-2025

I need help writing a filter. I want to pre-filter a vector index before performing a hybrid search and create this as a function. Below is a simple example of searching for products for a given customer. A prefilter is key as this provides authorizations for searching a vector index before applying the top k which reduces the vector space searching as a prefilter before the search. I am not seeing any filter capability like how you would call the API.
Example of the search API with prefilter

results = index.similarity_search(
query_text=question,
query_type = "HYBRID",
columns=["content", "product", "product_description", "product_id", "purchase_date"],
filters="{"customer_emai":customer_email},
num_results=5
)

Below is the SQL Function I need help on

%sql

CREATE OR REPLACE FUNCTION get_customer_products(

search_query string comment 'Search query',

customer_email string comment 'Customer email',

)

RETURNS TABLE (

product STRING,
product_description STRING,

product_id int,

purchase_date_timestamp timestamp

)

COMMENT 'Retrieve the customer products'

LANGUAGE SQL

RETURN (

SELECT

content,

product_id,

purchase_date_timestamp

from

vector_search(

index=> 'gold.my_store.products',

query => search_query,

query_type => 'HYBRID',

filter => '{"customer_email": "' || customer_email || '"}',

num_results => 5

)

the_peterlandis · ‎11-21-2025

Based on this documentation, it says, it indicate sql function VECTOR_SEARCH cannot apply pre filter which prefilter is a fundamental capability for vector search. Just very surprised this is not supported.

KaushalVachhani · ‎11-22-2025

@the_peterlandis, Yes, currently vector_search SQL function doesn't provide pre filter support. However, if you must implement the UC function for this, you can do it something like below using Python code with filters.

%sql
CREATE OR REPLACE FUNCTION kaushal.kaushal.vector_similarity_search(
  query_text STRING,
  filter_id INT,
  num_results INT
)
RETURNS STRING
LANGUAGE PYTHON
COMMENT "Vector similarity search using authenticated client"
ENVIRONMENT (
  dependencies = '["databricks-vectorsearch", "databricks-sdk"]',
  environment_version = 'None'
)
AS $$
import json
import os

# Get credentials from Databricks secrets
# You'll need to set these up first
def get_secret(scope, key):
    from databricks.sdk import WorkspaceClient
    w = WorkspaceClient()
    return w.secrets.get_secret(scope=scope, key=key).value

# Alternative: if dbutils is available in UDF context
# token = dbutils.secrets.get(scope="your-scope", key="databricks-token")
# host = dbutils.secrets.get(scope="your-scope", key="databricks-host")

# Set environment variables for authentication
os.environ['DATABRICKS_HOST'] = get_secret("your-scope", "databricks-host")
os.environ['DATABRICKS_TOKEN'] = get_secret("your-scope", "databricks-token")

from databricks.vector_search.client import VectorSearchClient

# Initialize client - should now pick up environment variables
client = VectorSearchClient()
index = client.get_index(
    endpoint_name="vector-search-demo-endpoint-kaushal",
    index_name="kaushal.kaushal.my_text_data_index"
)

results = index.similarity_search(
    query_text=query_text,
    columns=["id", "content"],
    filters={"id": [filter_id]},
    num_results=num_results
)

return json.dumps(results.get('result', {}).get('data_array', []))
$$;

Then run your UC function with SQL, and you should get the expected results.

View solution in original post