12-19-2022 04:45 AM
Hi, I am using Databricks and want to upgrade to Databricks runtime version 11.3 LTS which uses Spark 3.3 now.
Current system enviroment:
Target system enviroment:
After Upgrade, when I try to use the databricks-connector with Spark 3.3 my Spark jobs crash with the following stack trace:
py4j.protocol.Py4JJavaError: An error occurred while calling o47.sql.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(Lorg/antlr/v4/runtime/ParserRuleContext;Lscala/Function0;)Ljava/lang/Object;
at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:188)
at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:143)
at io.delta.sql.parser.DeltaSqlBaseParser$SingleStatementContext.accept(DeltaSqlBaseParser.java:160)
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at io.delta.sql.parser.DeltaSqlParser.$anonfun$parsePlan$1(DeltaSqlParser.scala:71)
at io.delta.sql.parser.DeltaSqlParser.parse(DeltaSqlParser.scala:100)
at io.delta.sql.parser.DeltaSqlParser.parsePlan(DeltaSqlParser.scala:70)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
After investigation, i found this caued by Spark 3.3.x. Becasue the code worked perfectly with Spark 3.2.x
It blocking our upgrade from 10.4 to 11.3 runtime on Databricks
Can you please guide me regarding the same.
12-21-2022 12:47 AM
Hi @Debayan Mukherjee I think i found RC.
My Spark config for the java libraries doesn't upgrade in the pipeline due to some reasons.
After Upgrade java libraries , the error was gone.
Anyway, thanks.
12-19-2022 08:06 AM
Hi, could you please provide the whole logs? Also, could you please confirm how did you get that it is caused by spark 3.3.x?
12-20-2022 01:49 AM
Hi @Debayan Mukherjee ,
there is the whole output
>>> spark = get_spark()
:: loading settings :: url = jar:file:/usr/local/lib/python3.8/dist-packages/pyspark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
com.databricks#spark-xml_2.12 added as a dependency
org.postgresql#postgresql added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-1300bbeb-d826-46b8-a3f0-2502ebf354af;1.0
confs: [default]
found io.delta#delta-core_2.12;1.1.0 in central
found org.antlr#antlr4-runtime;4.8 in central
found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
found com.databricks#spark-xml_2.12;0.14.0 in central
found commons-io#commons-io;2.8.0 in central
found org.glassfish.jaxb#txw2;2.3.4 in central
found org.apache.ws.xmlschema#xmlschema-core;2.2.5 in central
found org.postgresql#postgresql;42.2.23 in central
found org.checkerframework#checker-qual;3.5.0 in central
:: resolution report :: resolve 418ms :: artifacts dl 19ms
:: modules in use:
com.databricks#spark-xml_2.12;0.14.0 from central in [default]
commons-io#commons-io;2.8.0 from central in [default]
io.delta#delta-core_2.12;1.1.0 from central in [default]
org.antlr#antlr4-runtime;4.8 from central in [default]
org.apache.ws.xmlschema#xmlschema-core;2.2.5 from central in [default]
org.checkerframework#checker-qual;3.5.0 from central in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
org.glassfish.jaxb#txw2;2.3.4 from central in [default]
org.postgresql#postgresql;42.2.23 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 9 | 0 | 0 | 0 || 9 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-1300bbeb-d826-46b8-a3f0-2502ebf354af
confs: [default]
0 artifacts copied, 9 already retrieved (0kB/19ms)
22/12/20 09:45:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
>>> spark.sql("select '1'")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/session.py", line 1034, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self)
File "/usr/local/lib/python3.8/dist-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/utils.py", line 190, in deco
return f(*a, **kw)
File "/usr/local/lib/python3.8/dist-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o47.sql.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(Lorg/antlr/v4/runtime/ParserRuleContext;Lscala/Function0;)Ljava/lang/Object;
at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:188)
at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:143)
at io.delta.sql.parser.DeltaSqlBaseParser$SingleStatementContext.accept(DeltaSqlBaseParser.java:160)
at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at io.delta.sql.parser.DeltaSqlParser.$anonfun$parsePlan$1(DeltaSqlParser.scala:71)
at io.delta.sql.parser.DeltaSqlParser.parse(DeltaSqlParser.scala:100)
at io.delta.sql.parser.DeltaSqlParser.parsePlan(DeltaSqlParser.scala:70)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:750)
more precisely,I ran into the issue with Databricks runtime 11.3, which runs on spark 3.3.0. I tried to downgrade spark to 3.2.1. Despite the incompatble error, the code worked.
12-21-2022 12:47 AM
Hi @Debayan Mukherjee I think i found RC.
My Spark config for the java libraries doesn't upgrade in the pipeline due to some reasons.
After Upgrade java libraries , the error was gone.
Anyway, thanks.
12-26-2022 06:22 AM
Hi everyone this data was helped me thanks
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group