df.isEmpty() and df.fillna(0).isEmpty() throws error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-21-2025 01:11 AM
In our code we usually use Single user cluster with 13.3 LTS with Spark 3.4.1 when loading data from delta table to Azure SQL Hyperscale, and we did not experience any issues, but starting last week our pipeline has been failing with the following error when checking if the incoming dataframe is empty:
if df.isEmpty(): File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 47, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/dataframe.py", line 970, in isEmpty return self._jdf.isEmpty() File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o1092.isEmpty. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 39.0 failed 4 times, most recent failure: Lost task 0.3 in stage 39.0 (TID 7117) (10.227.2.111 executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Command exited with code 134
We have tried replacing the df.isEmpty() with df.fillna(0).isEmpty() still the same error:
if df.fillna(0).isEmpty(): File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 47, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/dataframe.py", line 970, in isEmpty return self._jdf.isEmpty() File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o1094.isEmpty. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 39.0 failed 4 times, most recent failure: Lost task 0.3 in stage 39.0 (TID 6661) (10.227.2.125 executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Command exited with code 134
We have tested with Spark 3.4.1 and also with 3.5.0 in the pipeline and both return the same result. What is interesting that at one time it was running with 3.5.0 fine for 46hours, but after restart it has failed with the above error again.
When testing with a simple notebook the following check also fail with Single user cluster Spark 3.4.1 or 3.5.0 but run OK with shared cluster:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-21-2025 04:37 AM
Hi @Katalin555,
The exit code 134 typically indicates an OutOfMemoryError
(abort signal from the OS). This means the executor ran out of memory. Have you validated cluster metrics and driver logs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-21-2025 04:50 AM
Hi @Alberto_Umana ,
Yes I checked and did not see any other information. We are using Driver: Standard_DS5_v2 · Workers: Standard_E16a_v4 · 1-6 workers, at the stage when the pipeline fails the shuffle information was :
- Shuffle Read Size / Records: 257.1 MiB / 49459142
- Shuffle Write Size / Records: 16.8 MiB / 1535990
5 tasks on 5 nodes succeed, then next task is tried 4 times on 4 different workers and fails with
ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Command exited with code 134
on all Memory utilization looks ok:
One example:

