<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic dataframe.rdd.isEmpty() is throwing error in 9.1 LTS in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31429#M22880</link>
    <description>&lt;P&gt;Loaded a csv file with five columns into a dataframe, and then added around 15+ columns using dataframe.withColumn method.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After adding these many columns, when I run the query df.rdd.isEmpty() - which throws the below error. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 32.0 failed 4 times, most recent failure: Lost task 0.3 in stage 32.0 (TID 28) (10.139.64.4 executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any idea what is the issue?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 19 Jan 2022 06:00:40 GMT</pubDate>
    <dc:creator>thushar</dc:creator>
    <dc:date>2022-01-19T06:00:40Z</dc:date>
    <item>
      <title>dataframe.rdd.isEmpty() is throwing error in 9.1 LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31429#M22880</link>
      <description>&lt;P&gt;Loaded a csv file with five columns into a dataframe, and then added around 15+ columns using dataframe.withColumn method.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After adding these many columns, when I run the query df.rdd.isEmpty() - which throws the below error. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 32.0 failed 4 times, most recent failure: Lost task 0.3 in stage 32.0 (TID 28) (10.139.64.4 executor 9): ExecutorLostFailure (executor 9 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any idea what is the issue?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 06:00:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31429#M22880</guid>
      <dc:creator>thushar</dc:creator>
      <dc:date>2022-01-19T06:00:40Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe.rdd.isEmpty() is throwing error in 9.1 LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31430#M22881</link>
      <description>&lt;P&gt;Hello again, @Thushar R​&amp;nbsp;- I'm sorry to hear that you're having this difficulty also. Let's give the community a chance to respond first. Thanks in advance for your patience.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Jan 2022 16:40:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31430#M22881</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-01-19T16:40:28Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe.rdd.isEmpty() is throwing error in 9.1 LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31431#M22882</link>
      <description>&lt;P&gt;Please check your logs as it can be some other issue.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please check also using bool(df.head(1)) instead.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Jan 2022 11:45:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31431#M22882</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-01-20T11:45:32Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe.rdd.isEmpty() is throwing error in 9.1 LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31432#M22883</link>
      <description>&lt;P&gt;Thanks for the workaround. But why this particular piece of code fails in 9.0 LTS runtime and run in 8.3 without issues. Any idea. Please see the code below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import&amp;nbsp;lit,col,row_number,floor,trim&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df = spark.read.option("header", "true").csv(filePath)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df2 = df.select(col("cc"),col("ac"),col("an"),\&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;col("ag"),col("at")).distinct()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;lstOfMissingColumns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col8', 'col9','col9', 'col10', 'col11', 'col12', 'col13',&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;'col14', 'col15', 'col16', 'col17']&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;for c in lstOfMissingColumns:&lt;/P&gt;&lt;P&gt;&amp;nbsp;df2 = df2.withColumn(c,lit(''))&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;df2.rdd.isEmpty()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Jan 2022 07:57:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31432#M22883</guid>
      <dc:creator>thushar</dc:creator>
      <dc:date>2022-01-21T07:57:41Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe.rdd.isEmpty() is throwing error in 9.1 LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31433#M22884</link>
      <description>&lt;P&gt;@Thushar R​&amp;nbsp;- Thank you for your patience. We are looking for the best person to help you. &lt;/P&gt;</description>
      <pubDate>Wed, 16 Feb 2022 17:04:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31433#M22884</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-02-16T17:04:24Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe.rdd.isEmpty() is throwing error in 9.1 LTS</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31434#M22885</link>
      <description>&lt;P&gt;Hi @Thushar R​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are you using the same CSV file? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;the error message is &lt;/P&gt;&lt;P&gt;"Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages" which could be a OOM error. How big is your CSV file? have you check the executor's 9 logs?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 01:11:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-rdd-isempty-is-throwing-error-in-9-1-lts/m-p/31434#M22885</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-02-24T01:11:43Z</dc:date>
    </item>
  </channel>
</rss>

