cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

df.isEmpty() and df.fillna(0).isEmpty() throws error

Katalin555
New Contributor II

In our code we usually use Single user cluster with 13.3 LTS with Spark 3.4.1 when loading data from delta table to Azure SQL Hyperscale, and we did not experience any issues, but starting last week our pipeline has been failing with the following error when checking if the incoming dataframe is empty:

if df.isEmpty(): File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 47, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/dataframe.py", line 970, in isEmpty return self._jdf.isEmpty() File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o1092.isEmpty. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 39.0 failed 4 times, most recent failure: Lost task 0.3 in stage 39.0 (TID 7117) (10.227.2.111 executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Command exited with code 134

We have tried replacing the df.isEmpty() with df.fillna(0).isEmpty() still the same error:

if df.fillna(0).isEmpty(): File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 47, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/dataframe.py", line 970, in isEmpty return self._jdf.isEmpty() File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o1094.isEmpty. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 39.0 failed 4 times, most recent failure: Lost task 0.3 in stage 39.0 (TID 6661) (10.227.2.125 executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Command exited with code 134

We have tested with Spark 3.4.1 and also with 3.5.0 in the pipeline and both return the same result. What is interesting that at one time it was running with 3.5.0 fine for 46hours, but after restart it has failed with the above error again.

When testing with a simple notebook the following check also fail with Single user cluster Spark 3.4.1 or 3.5.0 but run OK with shared cluster:

 
At the moment it is running for 10h + with  " if df.head(1) != 0 " . 
2 REPLIES 2

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Katalin555,

The exit code 134 typically indicates an OutOfMemoryError (abort signal from the OS). This means the executor ran out of memory. Have you validated cluster metrics and driver logs?

Katalin555
New Contributor II

Hi @Alberto_Umana ,
Yes I checked and did not see any other information. We are using Driver: Standard_DS5_v2 · Workers: Standard_E16a_v4 · 1-6 workers, at the stage when the pipeline fails the shuffle information was :

  • Shuffle Read Size / Records: 257.1 MiB / 49459142
  • Shuffle Write Size / Records: 16.8 MiB / 1535990

5 tasks on 5 nodes succeed, then next task is tried 4 times on 4 different workers and fails with

ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Command exited with code 134

on all Memory utilization looks ok:

One example:

Katalin555_0-1740141827630.png

 

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now