df.isEmpty() and df.fillna(0).isEmpty() throws error

Katalin555 — Fri, 21 Feb 2025 09:11:55 GMT

In our code we usually use Single user cluster with 13.3 LTS with Spark 3.4.1 when loading data from delta table to Azure SQL Hyperscale, and we did not experience any issues, but starting last week our pipeline has been failing with the following error when checking if the incoming dataframe is empty:

if df.isEmpty(): File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 47, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/dataframe.py", line 970, in isEmpty return self._jdf.isEmpty() File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o1092.isEmpty. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 39.0 failed 4 times, most recent failure: Lost task 0.3 in stage 39.0 (TID 7117) (10.227.2.111 executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Command exited with code 134

We have tried replacing the df.isEmpty() with df.fillna(0).isEmpty() still the same error:

if df.fillna(0).isEmpty(): File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 47, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/dataframe.py", line 970, in isEmpty return self._jdf.isEmpty() File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o1094.isEmpty. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 39.0 failed 4 times, most recent failure: Lost task 0.3 in stage 39.0 (TID 6661) (10.227.2.125 executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Command exited with code 134

We have tested with Spark 3.4.1 and also with 3.5.0 in the pipeline and both return the same result. What is interesting that at one time it was running with 3.5.0 fine for 46hours, but after restart it has failed with the above error again.

When testing with a simple notebook the following check also fail with Single user cluster Spark 3.4.1 or 3.5.0 but run OK with shared cluster:

At the moment it is running for 10h + with " if df.head(1) != 0 " .

Re: df.isEmpty() and df.fillna(0).isEmpty() throws error

Alberto_Umana — Fri, 21 Feb 2025 12:37:22 GMT

Hi @Katalin555,

The exit code 134 typically indicates an OutOfMemoryError (abort signal from the OS). This means the executor ran out of memory. Have you validated cluster metrics and driver logs?

Re: df.isEmpty() and df.fillna(0).isEmpty() throws error

Katalin555 — Fri, 21 Feb 2025 12:50:41 GMT

Hi @Alberto_Umana ,
Yes I checked and did not see any other information. We are using Driver: Standard_DS5_v2 · Workers: Standard_E16a_v4 · 1-6 workers, at the stage when the pipeline fails the shuffle information was :

Shuffle Read Size / Records: 257.1 MiB / 49459142
Shuffle Write Size / Records: 16.8 MiB / 1535990

5 tasks on 5 nodes succeed, then next task is tried 4 times on 4 different workers and fails with

ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Command exited with code 134

on all Memory utilization looks ok:

One example:

topic Re: df.isEmpty() and df.fillna(0).isEmpty() throws error in Data Engineering

df.isEmpty() and df.fillna(0).isEmpty() throws error

Re: df.isEmpty() and df.fillna(0).isEmpty() throws error

Re: df.isEmpty() and df.fillna(0).isEmpty() throws error