06-07-2024 12:09 AM - edited 06-07-2024 12:10 AM
Hi! We want to upgrade the DB runtime on our clusters from 13.3 LTS to 14.3 LTS. Currently, everything looks good except for the different error-handling in the new runtime.
For example, the error in the 13.3 LTS runtime looks familiar:
while the same code on 14.3 LTS runtime throws the following error:
Only after digging deeper into error logs, can I see that the error is the same in this case:
Not sure if it is important information but we use the spark.sql() function when calling merge into command. Is there a way to restore the previous error-handling behaviour because current errors are not informative?
06-07-2024 06:39 AM
06-07-2024 07:03 AM
In the notebook where this error happened below the merge statement after expanding the error message
The complete error is then the following:
TypeError: AutoFormattedTB.structured_traceback() missing 1 required positional argument: 'evalue'
[... skipping hidden 1 frame]
File <command-77793784558808>, line 1
----> 1 merge_sql_v2(
2 target_table = "silver.test",
3 update_table = "updates_id",
4 keycolumns = "id",
5 history = False,
6 dryrun = False
7 )
File <command-77793784557616>, line 134, in merge_sql_v2(target_table, update_table, keycolumns, history, dryrun)
133 if dryrun == False:
--> 134 spark.sql(mergesql).display()
135 spark.sql(deletesql).display()
File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function.<locals>.wrapper(*args, **kwargs)
46 try:
---> 47 res = func(*args, **kwargs)
48 logger.log_success(
49 module_name, class_name, function_name, time.perf_counter() - start, signature
50 )
File /databricks/spark/python/pyspark/sql/session.py:1748, in SparkSession.sql(self, sqlQuery, args, **kwargs)
1745 litArgs = self._jvm.PythonUtils.toArray(
1746 [_to_java_column(lit(v)) for v in (args or [])]
1747 )
-> 1748 return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
1749 finally:
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
1354 answer = self.gateway_client.send_command(command)
-> 1355 return_value = get_return_value(
1356 answer, self.gateway_client, self.target_id, self.name)
1358 for temp_arg in temp_args:
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:230, in capture_sql_exception.<locals>.deco(*a, **kw)
227 if not isinstance(converted, UnknownException):
228 # Hide where the exception came from that shows a non-Pythonic
229 # JVM exception message.
--> 230 raise converted from None
231 else:
UnsupportedOperationException: [DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE] Cannot perform Merge as multiple source rows matched and attempted to modify the same
target row in the Delta table in possibly conflicting ways. By SQL semantics of Merge,
when multiple source rows match on the same target row, the result may be ambiguous
as it is unclear which source row should be used to update or delete the matching
target row. You can preprocess the source table to eliminate the possibility of
multiple matches. Please refer to
https://docs.microsoft.com/azure/databricks/delta/merge#merge-error
During handling of the above exception, another exception occurred:
Py4JError Traceback (most recent call last)
File /databricks/python/lib/python3.10/site-packages/IPython/core/interactiveshell.py:1975, in InteractiveShell.set_custom_exc.<locals>.wrapped(self, etype, value, tb, tb_offset)
1974 try:
-> 1975 stb = handler(self,etype,value,tb,tb_offset=tb_offset)
1976 return validate_stb(stb)
File /databricks/python_shell/dbruntime/ExceptionHandler.py:26, in custom_exception_handler(shell, etype, exception, tb, tb_offset)
21 data = {
22 'errorClass': exception.getErrorClass(),
23 'messageParameters': exception.getMessageParameters(),
24 'sqlState': exception.getSqlState(),
25 }
---> 26 query_contexts = exception.getQueryContext()
27 if len(query_contexts) != 0:
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:150, in CapturedException.getQueryContext(self)
147 if self._origin is not None and is_instance_of(
148 gw, self._origin, "org.apache.spark.SparkThrowable"
149 ):
--> 150 return [QueryContext(q) for q in self._origin.getQueryContext()]
151 else:
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
1354 answer = self.gateway_client.send_command(command)
-> 1355 return_value = get_return_value(
1356 answer, self.gateway_client, self.target_id, self.name)
1358 for temp_arg in temp_args:
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:224, in capture_sql_exception.<locals>.deco(*a, **kw)
223 try:
--> 224 return f(*a, **kw)
225 except Py4JJavaError as e:
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:330, in get_return_value(answer, gateway_client, target_id, name)
329 else:
--> 330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
333 else:
Py4JError: An error occurred while calling o514.getQueryContext. Trace:
py4j.Py4JException: Method getQueryContext([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:352)
at py4j.Gateway.invoke(Gateway.java:297)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.lang.Thread.run(Thread.java:750)
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
[... skipping hidden 1 frame]
File /databricks/python/lib/python3.10/site-packages/IPython/core/interactiveshell.py:1985, in InteractiveShell.set_custom_exc.<locals>.wrapped(self, etype, value, tb, tb_offset)
1983 print(self.InteractiveTB.stb2text(stb))
1984 print("The original exception:")
-> 1985 stb = self.InteractiveTB.structured_traceback(
1986 (etype,value,tb), tb_offset=tb_offset
1987 )
1988 return stb
Hope this helps.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group