Using ODBC or JDBC to read from a table fails when I attempt to use an ORDER BY clause. In one sample case, I have a fairly small table (just 1946 rows).
select *
from some_table
order by some_field
Result:
java.lang.IllegalArgumentException: requirement failed: Subquery subquery#485, [id=#937] has not finished
at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:53)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:445)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:269)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties(ThriftLocalProperties.scala:123)
at org.apache.spark.sql.hive.thriftserver.ThriftLocalProperties.withLocalProperties$(ThriftLocalProperties.scala:48)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:54)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:247)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:232)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:281)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.sql.execution.ScalarSubquery.eval(subquery.scala:100)
at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:160)
This seems pretty weird to me—I can't run a simple ORDER BY clause?
I've tried this with the ODBC driver and JDBC (2.6.32). With JDBC, I tried all three modes for UseNativeQuery (0, 1, 2) just to exhaust all the options. I do not have any query timeout specified, though specifying a large timeout value made no difference.
The immediate impression is that there's something wrong with the JDBC driver, though of course if I'm missing something obvious, that would be all the simpler—it seems a surprising bug to find. If this really is a bug, where would I go to report it?