I've posted the same question on stackoverflow (link) as well. I will post any solution I get there.
I was trying to understand using many columns in the IN query and came across this statement.
SELECT (1, 2) IN (SELECT c1, c2 FROM VALUES(1, 2), (3, 4) AS (c1, c2));
Its output is as follows
(named_struct('1', 1, '2', 2) IN (listquery())) |
true |
I tried using named_struct directly as follows
SELECT NAMED_STRUCT('1', 1, '2', 2) IN (SELECT c1, c2 FROM VALUES(1, 2), (3, 4) AS (c1, c2));
It gives the following error
Cannot resolve "(named_struct(1, 1, 2, 2) IN (listquery()))" due to data type mismatch:
The number of columns in the left hand side of an IN subquery does not match the number of columns in the output of subquery.
Left hand side columns(length: 1): ["named_struct(1, 1, 2, 2)"],
right hand side columns(length: 2): ["c1", "c2"].;
line 1 pos 36;
'Project [unresolvedalias(named_struct(1, 1, 2, 2) IN (list#1529887 []), None)]
: +- Project [c1#1529888, c2#1529889]
: +- SubqueryAlias AS
: +- LocalRelation [c1#1529888, c2#1529889]
+- OneRowRelation
Why does the number of columns becomes an issue now? The above query works fine even when I replace named_struct with struct.
Any help is appreciated.