<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Help with Databricks vector search index advanced metadata filtering in Generative AI</title>
    <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103970#M682</link>
    <description>&lt;P class="_1t7bu9h1 paragraph"&gt;Here is an example of how you can implement this in Python:&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;
&lt;SPAN class="hljs-comment"&gt;# Step 1: Retrieve the data&lt;/SPAN&gt;
results = index.similarity_search(query_text=&lt;SPAN class="hljs-string"&gt;"your_query"&lt;/SPAN&gt;, columns=[&lt;SPAN class="hljs-string"&gt;"id"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"metadata_column"&lt;/SPAN&gt;], num_results=&lt;SPAN class="hljs-number"&gt;100&lt;/SPAN&gt;)

&lt;SPAN class="hljs-comment"&gt;# Step 2: Define the custom filtering function&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;def&lt;/SPAN&gt; &lt;SPAN class="hljs-title function_"&gt;filter_by_intersection&lt;/SPAN&gt;(&lt;SPAN class="hljs-params"&gt;results, input_array&lt;/SPAN&gt;):
    filtered_results = []
    &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; result &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; results:
        metadata_array = result[&lt;SPAN class="hljs-string"&gt;"metadata_column"&lt;/SPAN&gt;]
        &lt;SPAN class="hljs-keyword"&gt;if&lt;/SPAN&gt; &lt;SPAN class="hljs-built_in"&gt;any&lt;/SPAN&gt;(item &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; input_array &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; item &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; metadata_array):
            filtered_results.append(result)
    &lt;SPAN class="hljs-keyword"&gt;return&lt;/SPAN&gt; filtered_results

&lt;SPAN class="hljs-comment"&gt;# Step 3: Apply the custom filtering function&lt;/SPAN&gt;
input_array = [&lt;SPAN class="hljs-string"&gt;"value1"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"value2"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"value3"&lt;/SPAN&gt;]
filtered_results = filter_by_intersection(results, input_array)

&lt;SPAN class="hljs-comment"&gt;# The filtered_results now contain only the entries where the intersection is non-empty&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;By following these steps, you can achieve the desired filtering mechanism based on the intersection of arrays. This solution allows you to leverage the existing Databricks vector search capabilities while implementing custom logic to meet your specific requirements.&lt;/P&gt;</description>
    <pubDate>Thu, 02 Jan 2025 15:30:12 GMT</pubDate>
    <dc:creator>Walter_C</dc:creator>
    <dc:date>2025-01-02T15:30:12Z</dc:date>
    <item>
      <title>Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103870#M678</link>
      <description>&lt;P&gt;I have been able to successfully implement a Databricks vector search index with metadata filtering (&lt;A href="https://docs.databricks.com/en/generative-ai/create-query-vector-search.html#use-filters-on-queries" target="_blank" rel="noopener"&gt;How to create and query a vector search index | Databricks on AWS&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;However, I am facing a challenge when implementing a more advanced filtering mechanism.&lt;/P&gt;&lt;P&gt;In my setup, I have a metadata column in the index that contains an array of strings. I need to create a filter that identifies matches based on the intersection between an input array and the index array. Specifically, a match should occur if the intersection returns at least one common value.&lt;/P&gt;&lt;P&gt;I don't see a straightforward way to do this with the existing Databricks vector search filter options.&lt;/P&gt;&lt;P&gt;Thanks for any advice!&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jan 2025 08:01:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103870#M678</guid>
      <dc:creator>jericksoncea</dc:creator>
      <dc:date>2025-01-02T08:01:41Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103907#M679</link>
      <description>&lt;P&gt;Currently, the Databricks vector search filter options do not directly support filtering based on the intersection of arrays.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jan 2025 12:21:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103907#M679</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-02T12:21:44Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103958#M681</link>
      <description>&lt;P&gt;Yes, I see that...&lt;/P&gt;&lt;P&gt;Are there any known work arounds? Some combination of existing filters or a code customization? This does not seem to be an uncommon search pattern...&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jan 2025 14:57:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103958#M681</guid>
      <dc:creator>jericksoncea</dc:creator>
      <dc:date>2025-01-02T14:57:53Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103970#M682</link>
      <description>&lt;P class="_1t7bu9h1 paragraph"&gt;Here is an example of how you can implement this in Python:&lt;/P&gt;
&lt;DIV class="gb5fhw2"&gt;
&lt;PRE&gt;&lt;CODE class="markdown-code-python _1t7bu9hb hljs language-python gb5fhw3"&gt;
&lt;SPAN class="hljs-comment"&gt;# Step 1: Retrieve the data&lt;/SPAN&gt;
results = index.similarity_search(query_text=&lt;SPAN class="hljs-string"&gt;"your_query"&lt;/SPAN&gt;, columns=[&lt;SPAN class="hljs-string"&gt;"id"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"metadata_column"&lt;/SPAN&gt;], num_results=&lt;SPAN class="hljs-number"&gt;100&lt;/SPAN&gt;)

&lt;SPAN class="hljs-comment"&gt;# Step 2: Define the custom filtering function&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;def&lt;/SPAN&gt; &lt;SPAN class="hljs-title function_"&gt;filter_by_intersection&lt;/SPAN&gt;(&lt;SPAN class="hljs-params"&gt;results, input_array&lt;/SPAN&gt;):
    filtered_results = []
    &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; result &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; results:
        metadata_array = result[&lt;SPAN class="hljs-string"&gt;"metadata_column"&lt;/SPAN&gt;]
        &lt;SPAN class="hljs-keyword"&gt;if&lt;/SPAN&gt; &lt;SPAN class="hljs-built_in"&gt;any&lt;/SPAN&gt;(item &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; input_array &lt;SPAN class="hljs-keyword"&gt;for&lt;/SPAN&gt; item &lt;SPAN class="hljs-keyword"&gt;in&lt;/SPAN&gt; metadata_array):
            filtered_results.append(result)
    &lt;SPAN class="hljs-keyword"&gt;return&lt;/SPAN&gt; filtered_results

&lt;SPAN class="hljs-comment"&gt;# Step 3: Apply the custom filtering function&lt;/SPAN&gt;
input_array = [&lt;SPAN class="hljs-string"&gt;"value1"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"value2"&lt;/SPAN&gt;, &lt;SPAN class="hljs-string"&gt;"value3"&lt;/SPAN&gt;]
filtered_results = filter_by_intersection(results, input_array)

&lt;SPAN class="hljs-comment"&gt;# The filtered_results now contain only the entries where the intersection is non-empty&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;P class="_1t7bu9h1 paragraph"&gt;By following these steps, you can achieve the desired filtering mechanism based on the intersection of arrays. This solution allows you to leverage the existing Databricks vector search capabilities while implementing custom logic to meet your specific requirements.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jan 2025 15:30:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103970#M682</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-02T15:30:12Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103977#M683</link>
      <description>&lt;P&gt;The above solution is effectively a post-search filter, which would reduce the number of results returned. I am looking for a solution that performs the filtering on the index itself.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jan 2025 16:21:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103977#M683</guid>
      <dc:creator>jericksoncea</dc:creator>
      <dc:date>2025-01-02T16:21:09Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103988#M684</link>
      <description>&lt;P&gt;Allow me to look further and see if there is any additional approach.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jan 2025 17:05:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/103988#M684</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-02T17:05:29Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/104084#M685</link>
      <description>&lt;P&gt;Unfortunately I was not able to find any way around that with the proposed solution above&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 14:11:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/104084#M685</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-03T14:11:41Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/104121#M686</link>
      <description>&lt;P&gt;Thanks for you help...&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 17:28:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/104121#M686</guid>
      <dc:creator>jericksoncea</dc:creator>
      <dc:date>2025-01-03T17:28:02Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/104127#M687</link>
      <description>&lt;P&gt;sure, happy to help, let us know in case you have additional questions&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 17:43:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/104127#M687</guid>
      <dc:creator>Walter_C</dc:creator>
      <dc:date>2025-01-03T17:43:58Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/105601#M709</link>
      <description>&lt;P&gt;Hi.&amp;nbsp; You can apply a filter on any metadata field in the index.&lt;BR /&gt;See the "Use filters on queries" section here:&amp;nbsp;&amp;nbsp;&lt;A href="https://docs.databricks.com/en/generative-ai/create-query-vector-search.html#query-a-vector-search-endpoint" target="_blank"&gt;How to create and query a vector search index | Databricks on AWS&lt;/A&gt;&lt;BR /&gt;The JSON filter syntax takes some getting used to but is flexible.&amp;nbsp; Here's a snippet that shows how to do this:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SEARCH_FILTER = {
    "language": "English",
    "source_types LIKE": "News"
    }

# Limit to English News publications
results = vs.similarity_search(
    query_text="Articles that discuss GenAI ethics",
    filters=SEARCH_FILTER ,
    num_results=4
    )&lt;/LI-CODE&gt;</description>
      <pubDate>Tue, 14 Jan 2025 15:15:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/105601#M709</guid>
      <dc:creator>txti</dc:creator>
      <dc:date>2025-01-14T15:15:45Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/105691#M711</link>
      <description>&lt;P&gt;There is no filter operator&amp;nbsp;&lt;SPAN&gt;based on the intersection of arrays.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jan 2025 10:45:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/105691#M711</guid>
      <dc:creator>jericksoncea</dc:creator>
      <dc:date>2025-01-15T10:45:51Z</dc:date>
    </item>
    <item>
      <title>Re: Help with Databricks vector search index advanced metadata filtering</title>
      <link>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/105987#M712</link>
      <description>&lt;P&gt;I see, did not read your question carefully enough.&lt;BR /&gt;If I now understand your requirement correctly, this syntax (from docs) should do the trick:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;No filter operator specified&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;Filter checks for an exact match. If multiple values are specified, it matches any of the values.&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;&lt;SPAN class=""&gt;{"id":&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;200}&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;{"id":&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;[200,&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;300]}&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jan 2025 19:16:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/generative-ai/help-with-databricks-vector-search-index-advanced-metadata/m-p/105987#M712</guid>
      <dc:creator>txti</dc:creator>
      <dc:date>2025-01-16T19:16:13Z</dc:date>
    </item>
  </channel>
</rss>

