<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: df.isEmpty() and df.fillna(0).isEmpty() throws error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/df-isempty-and-df-fillna-0-isempty-throws-error/m-p/110852#M43713</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/150195"&gt;@Katalin555&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;The exit code 134 typically indicates an &lt;CODE&gt;OutOfMemoryError&lt;/CODE&gt; (abort signal from the OS). This means the executor ran out of memory. Have you validated cluster metrics and driver logs?&lt;/P&gt;</description>
    <pubDate>Fri, 21 Feb 2025 12:37:22 GMT</pubDate>
    <dc:creator>Alberto_Umana</dc:creator>
    <dc:date>2025-02-21T12:37:22Z</dc:date>
    <item>
      <title>df.isEmpty() and df.fillna(0).isEmpty() throws error</title>
      <link>https://community.databricks.com/t5/data-engineering/df-isempty-and-df-fillna-0-isempty-throws-error/m-p/110825#M43705</link>
      <description>&lt;P&gt;In our code we usually use Single user cluster with 13.3 LTS with Spark 3.4.1 when loading data from delta table to Azure SQL Hyperscale, and we did not experience any issues, but starting last week our pipeline has been failing with the following error when checking if the incoming dataframe is empty:&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;if df.isEmpty(): File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 47, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/dataframe.py", line 970, in isEmpty return self._jdf.isEmpty() File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o1092.isEmpty. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 39.0 failed 4 times, most recent failure: Lost task 0.3 in stage 39.0 (TID 7117) (10.227.2.111 executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Command exited with code 134&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;We have tried replacing the df.isEmpty() with&amp;nbsp;df.fillna(0).isEmpty() still the same error:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;if df.fillna(0).isEmpty(): File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 47, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/dataframe.py", line 970, in isEmpty return self._jdf.isEmpty() File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1355, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions/captured.py", line 224, in deco return f(*a, **kw) File "/databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o1094.isEmpty. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 39.0 failed 4 times, most recent failure: Lost task 0.3 in stage 39.0 (TID 6661) (10.227.2.125 executor 3): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Command exited with code 134&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;We have tested with Spark 3.4.1 and also with 3.5.0 in the pipeline and both return the same result. What is interesting that at one time it was running with 3.5.0 fine for 46hours, but after restart it has failed with the above error again.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;When testing with a simple notebook the following check also fail with Single user cluster Spark 3.4.1 or 3.5.0 but run OK with shared cluster:&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;At the moment it is running for 10h + with&amp;nbsp; " if &lt;/SPAN&gt;&lt;SPAN&gt;df.head(1) != 0 " .&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 21 Feb 2025 09:11:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/df-isempty-and-df-fillna-0-isempty-throws-error/m-p/110825#M43705</guid>
      <dc:creator>Katalin555</dc:creator>
      <dc:date>2025-02-21T09:11:55Z</dc:date>
    </item>
    <item>
      <title>Re: df.isEmpty() and df.fillna(0).isEmpty() throws error</title>
      <link>https://community.databricks.com/t5/data-engineering/df-isempty-and-df-fillna-0-isempty-throws-error/m-p/110852#M43713</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/150195"&gt;@Katalin555&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;The exit code 134 typically indicates an &lt;CODE&gt;OutOfMemoryError&lt;/CODE&gt; (abort signal from the OS). This means the executor ran out of memory. Have you validated cluster metrics and driver logs?&lt;/P&gt;</description>
      <pubDate>Fri, 21 Feb 2025 12:37:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/df-isempty-and-df-fillna-0-isempty-throws-error/m-p/110852#M43713</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-21T12:37:22Z</dc:date>
    </item>
    <item>
      <title>Re: df.isEmpty() and df.fillna(0).isEmpty() throws error</title>
      <link>https://community.databricks.com/t5/data-engineering/df-isempty-and-df-fillna-0-isempty-throws-error/m-p/110854#M43714</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106294"&gt;@Alberto_Umana&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;Yes I checked and did not see any other information. We are using&amp;nbsp;&lt;SPAN&gt;Driver: Standard_DS5_v2 · Workers: Standard_E16a_v4 · 1-6 workers, at the stage when the pipeline fails the shuffle information was :&lt;/SPAN&gt;&lt;/P&gt;&lt;UL class=""&gt;&lt;LI&gt;&lt;STRONG&gt;Shuffle Read Size / Records:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;257.1 MiB / 49459142&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Shuffle Write Size / Records:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;16.8 MiB / 1535990&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;5 tasks on 5 nodes succeed, then next task is tried 4 times on 4 different workers and fails with&lt;/P&gt;&lt;PRE&gt;ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Command exited with code 134&lt;/PRE&gt;&lt;P&gt;on all Memory utilization looks ok:&lt;/P&gt;&lt;P&gt;One example:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Katalin555_0-1740141827630.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/15002i9B35FA0874E6D49C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Katalin555_0-1740141827630.png" alt="Katalin555_0-1740141827630.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Feb 2025 12:50:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/df-isempty-and-df-fillna-0-isempty-throws-error/m-p/110854#M43714</guid>
      <dc:creator>Katalin555</dc:creator>
      <dc:date>2025-02-21T12:50:41Z</dc:date>
    </item>
  </channel>
</rss>

