<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Use Array in WHERE IN clause in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129336#M48501</link>
    <description>&lt;P&gt;I don't have access to computer right now, but I'll provide you tomorrow code that should work&lt;/P&gt;</description>
    <pubDate>Fri, 22 Aug 2025 18:12:56 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-08-22T18:12:56Z</dc:date>
    <item>
      <title>Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129326#M48496</link>
      <description>&lt;P&gt;This is what I'm trying to do using SQL:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;EM&gt;create table check1 as&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;select * from dataA&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;where IDs in ('12483258','12483871','12483883');&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;The list of IDs is much longer and may be changed so I want to use a variable for that. This is what I have tried&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;EM&gt;declare or replace ID&amp;nbsp;ARRAY&amp;lt;string&amp;gt;;&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;SET VARIABLE ID&amp;nbsp;= ARRAY('12483258','12483871','12483883');&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;create table check1 as&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;EM&gt;select * from dataA&lt;/EM&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;EM&gt;where IDs in (ID);&lt;/EM&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;And I get the error message below&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;data type mismatch: Input to `in` should all be the same type, but it's ["STRING", "ARRAY&amp;lt;STRING&amp;gt;"]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;It appears that databricks won't compare IDs column to the array elements. I also tried split and explode but didn't work, maybe the syntax wasn't right.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 22 Aug 2025 17:19:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129326#M48496</guid>
      <dc:creator>zc</dc:creator>
      <dc:date>2025-08-22T17:19:49Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129332#M48498</link>
      <description>&lt;P&gt;Hi @,&lt;/P&gt;&lt;P&gt;This is expected. Your id column is of type string and you're trying to provide in in clause array&amp;lt;string&amp;gt;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 17:46:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129332#M48498</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-08-22T17:46:41Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129334#M48499</link>
      <description>&lt;P&gt;Thanks for responding. So what's the proper way of using an array there?&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 17:55:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129334#M48499</guid>
      <dc:creator>zc</dc:creator>
      <dc:date>2025-08-22T17:55:39Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129336#M48501</link>
      <description>&lt;P&gt;I don't have access to computer right now, but I'll provide you tomorrow code that should work&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 18:12:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129336#M48501</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-08-22T18:12:56Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129348#M48502</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176758"&gt;@zc&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;I managed to reproduce your scenario and put together a possible solution—see if it helps.&lt;/P&gt;&lt;P&gt;Simulating the error:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="WiliamRosa_0-1755888823722.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19275i627E3FC00C0AFB2B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="WiliamRosa_0-1755888823722.png" alt="WiliamRosa_0-1755888823722.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Solution with array:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="WiliamRosa_1-1755888876798.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19276i068939F78566E800/image-size/medium?v=v2&amp;amp;px=400" role="button" title="WiliamRosa_1-1755888876798.png" alt="WiliamRosa_1-1755888876798.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 18:54:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129348#M48502</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-08-22T18:54:58Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129354#M48504</link>
      <description>&lt;P&gt;Root Cause&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;In your SQL statement, where IDs in (ID) is comparing a string column (IDs) with an array variable (ARRAY&amp;lt;STRING&amp;gt;).&lt;/LI&gt;&lt;LI&gt;Databricks SQL does not allow IN to directly take an array — IN expects a list of scalar values, not an array object.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;That’s why you see the error&lt;/P&gt;&lt;P&gt;Solution:&lt;/P&gt;&lt;P&gt;Try Using array_contains to structure your query.&lt;/P&gt;&lt;P&gt;Use array_contains&lt;/P&gt;&lt;P&gt;If you’re filtering using an array variable:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;DECLARE ids ARRAY&amp;lt;STRING&amp;gt;;&lt;/P&gt;&lt;P&gt;SET VARIABLE ids = ARRAY('12483258','12483871','12483883');&lt;/P&gt;&lt;P&gt;CREATE TABLE check1 AS&lt;/P&gt;&lt;P&gt;SELECT *&lt;/P&gt;&lt;P&gt;FROM dataA&lt;/P&gt;&lt;P&gt;WHERE array_contains(ids, IDs);&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 19:16:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129354#M48504</guid>
      <dc:creator>ManojkMohan</dc:creator>
      <dc:date>2025-08-22T19:16:36Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129359#M48505</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/155141"&gt;@ManojkMohan&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you so much for your detailed explanation and perfect solution!&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 20:20:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129359#M48505</guid>
      <dc:creator>zc</dc:creator>
      <dc:date>2025-08-22T20:20:36Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129360#M48506</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for this, your solution works but in a different way. I've actually been using temp views in the same way, but that requires extra steps that's why I wanted to go for variables instead. But thank you for putting this together!&lt;/P&gt;</description>
      <pubDate>Fri, 22 Aug 2025 20:20:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129360#M48506</guid>
      <dc:creator>zc</dc:creator>
      <dc:date>2025-08-22T20:20:14Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129435#M48528</link>
      <description>&lt;P&gt;Nice solutions!&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/155141"&gt;@ManojkMohan&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I love the use of the temp view for the intermediate result. The array_contains is also a really nice touch.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/155141"&gt;@ManojkMohan&lt;/a&gt;&amp;nbsp;when you write "&lt;SPAN&gt;SET VARIABLE ids = &lt;STRONG&gt;ARRAY('12483258','12483871','12483883')&lt;/STRONG&gt;;" ... can this array be from another query to make it dynamic?&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;Really cool stuff.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/176758"&gt;@zc&lt;/a&gt;&amp;nbsp;I guess we could also use parameters:&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/sql/user/queries/query-parameters#query-based-dropdown-list" target="_blank"&gt;https://docs.databricks.com/aws/en/sql/user/queries/query-parameters#query-based-dropdown-list&lt;/A&gt;&amp;nbsp;. The benefit of this is that the query can be maintained/updated outside of the script, given that you used a query based drop down list parameter. I haven't tried it but it makes sense, in theory. Example of using a parameter below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="BS_THE_ANALYST_0-1755952877629.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19299i830A2AB0DF844B64/image-size/medium?v=v2&amp;amp;px=400" role="button" title="BS_THE_ANALYST_0-1755952877629.png" alt="BS_THE_ANALYST_0-1755952877629.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;All the best,&lt;BR /&gt;BS&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Aug 2025 12:44:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129435#M48528</guid>
      <dc:creator>BS_THE_ANALYST</dc:creator>
      <dc:date>2025-08-23T12:44:10Z</dc:date>
    </item>
    <item>
      <title>Re: Use Array in WHERE IN clause</title>
      <link>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129650#M48585</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146924"&gt;@BS_THE_ANALYST&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for responding. Yeah I guess that's an option too. Talking about a list maintained outside of the script though, I'd prefer having a csv and import it as a temp view when needed.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Aug 2025 16:33:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/use-array-in-where-in-clause/m-p/129650#M48585</guid>
      <dc:creator>zc</dc:creator>
      <dc:date>2025-08-25T16:33:57Z</dc:date>
    </item>
  </channel>
</rss>

