Databricks Community

MC006 · ‎12-19-2022

Hi, I am using Databricks and want to upgrade to Databricks runtime version 11.3 LTS which uses Spark 3.3 now.

Current system enviroment:

Operating System: Ubuntu 20.04.4 LTS
Java: Zulu 8.56.0.21-CA-linux64
Python: 3.8.10
Delta Lake: 1.1.0

Target system enviroment:

Operating System: Ubuntu 20.04.5 LTS
Java: Zulu 8.56.0.21-CA-linux64
Python: 3.9.5
Delta Lake: 2.1.0

After Upgrade, when I try to use the databricks-connector with Spark 3.3 my Spark jobs crash with the following stack trace:

py4j.protocol.Py4JJavaError: An error occurred while calling o47.sql.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(Lorg/antlr/v4/runtime/ParserRuleContext;Lscala/Function0;)Ljava/lang/Object;
        at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:188)
        at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:143)
        at io.delta.sql.parser.DeltaSqlBaseParser$SingleStatementContext.accept(DeltaSqlBaseParser.java:160)
        at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
        at io.delta.sql.parser.DeltaSqlParser.$anonfun$parsePlan$1(DeltaSqlParser.scala:71)
        at io.delta.sql.parser.DeltaSqlParser.parse(DeltaSqlParser.scala:100)
        at io.delta.sql.parser.DeltaSqlParser.parsePlan(DeltaSqlParser.scala:70)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.lang.Thread.run(Thread.java:750)

After investigation, i found this caued by Spark 3.3.x. Becasue the code worked perfectly with Spark 3.2.x

It blocking our upgrade from 10.4 to 11.3 runtime on Databricks

Can you please guide me regarding the same.

MC006 · ‎12-21-2022

Hi @Debayan Mukherjee I think i found RC.

My Spark config for the java libraries doesn't upgrade in the pipeline due to some reasons.

After Upgrade java libraries , the error was gone.

Anyway, thanks.

View solution in original post

Debayan · ‎12-19-2022

Hi, could you please provide the whole logs? Also, could you please confirm how did you get that it is caused by spark 3.3.x?

MC006 · ‎12-20-2022

Hi @Debayan Mukherjee ,

there is the whole output

>>> spark = get_spark()
:: loading settings :: url = jar:file:/usr/local/lib/python3.8/dist-packages/pyspark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
com.databricks#spark-xml_2.12 added as a dependency
org.postgresql#postgresql added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-1300bbeb-d826-46b8-a3f0-2502ebf354af;1.0
        confs: [default]
        found io.delta#delta-core_2.12;1.1.0 in central
        found org.antlr#antlr4-runtime;4.8 in central
        found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
        found com.databricks#spark-xml_2.12;0.14.0 in central
        found commons-io#commons-io;2.8.0 in central
        found org.glassfish.jaxb#txw2;2.3.4 in central
        found org.apache.ws.xmlschema#xmlschema-core;2.2.5 in central
        found org.postgresql#postgresql;42.2.23 in central
        found org.checkerframework#checker-qual;3.5.0 in central
:: resolution report :: resolve 418ms :: artifacts dl 19ms
        :: modules in use:
        com.databricks#spark-xml_2.12;0.14.0 from central in [default]
        commons-io#commons-io;2.8.0 from central in [default]
        io.delta#delta-core_2.12;1.1.0 from central in [default]
        org.antlr#antlr4-runtime;4.8 from central in [default]
        org.apache.ws.xmlschema#xmlschema-core;2.2.5 from central in [default]
        org.checkerframework#checker-qual;3.5.0 from central in [default]
        org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
        org.glassfish.jaxb#txw2;2.3.4 from central in [default]
        org.postgresql#postgresql;42.2.23 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   9   |   0   |   0   |   0   ||   9   |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-1300bbeb-d826-46b8-a3f0-2502ebf354af
        confs: [default]
        0 artifacts copied, 9 already retrieved (0kB/19ms)
22/12/20 09:45:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
>>> spark.sql("select '1'")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/session.py", line 1034, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self)
  File "/usr/local/lib/python3.8/dist-packages/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/usr/local/lib/python3.8/dist-packages/pyspark/sql/utils.py", line 190, in deco
    return f(*a, **kw)
  File "/usr/local/lib/python3.8/dist-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o47.sql.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(Lorg/antlr/v4/runtime/ParserRuleContext;Lscala/Function0;)Ljava/lang/Object;
        at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:188)
        at io.delta.sql.parser.DeltaSqlAstBuilder.visitSingleStatement(DeltaSqlParser.scala:143)
        at io.delta.sql.parser.DeltaSqlBaseParser$SingleStatementContext.accept(DeltaSqlBaseParser.java:160)
        at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
        at io.delta.sql.parser.DeltaSqlParser.$anonfun$parsePlan$1(DeltaSqlParser.scala:71)
        at io.delta.sql.parser.DeltaSqlParser.parse(DeltaSqlParser.scala:100)
        at io.delta.sql.parser.DeltaSqlParser.parsePlan(DeltaSqlParser.scala:70)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.lang.Thread.run(Thread.java:750)

more precisely，I ran into the issue with Databricks runtime 11.3, which runs on spark 3.3.0. I tried to downgrade spark to 3.2.1. Despite the incompatble error, the code worked.