<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ExecutorLostFailure: Remote RPC Client Disassociated in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29366#M21100</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;According to &lt;A href="https://docs.databricks.com/jobs.html#jar-job-tips" target="test_blank"&gt;https://docs.databricks.com/jobs.html#jar-job-tips&lt;/A&gt;
&lt;P&gt;Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;That was my problem, to "&lt;I&gt;fix it" &lt;/I&gt;I've just set the logging level to ERROR&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;val sc = SparkContext.getOrCreate(conf) &lt;P&gt;&lt;/P&gt;sc.setLogLevel("ERROR")&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;It was solved 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 10 Dec 2019 16:36:36 GMT</pubDate>
    <dc:creator>RodrigoDe_Freit</dc:creator>
    <dc:date>2019-12-10T16:36:36Z</dc:date>
    <item>
      <title>ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29352#M21086</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;This is an expensive and long-running job that gets about halfway done before failing. The stack trace is included below, but here is the salient part: &lt;/P&gt;
&lt;P&gt;&lt;B&gt;Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4881 in stage 1.0 failed 4 times, most recent failure: Lost task 4881.3 in stage 1.0 (TID 7305, 10.37.129.129): ExecutorLostFailure (executor 116 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;This job has been running fine for months up to this point. I have tried increasing the node size, but still receiving this error. I saw from a Google search that this might be YARN not having enough provisioned memory, but I was under the impression that it was all configured under the hood with Databricks. Should I try tweaking these values? If so which?&lt;/P&gt;
&lt;P&gt;Here is the full stack-trace:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------Py4JJavaError                             Traceback (most recent call last)
&amp;lt;ipython-input-2-b42178413d3f&amp;gt; in &amp;lt;module&amp;gt;()    254     )
    255--&amp;gt; 256final_data.write         .format('com.databricks.spark.redshift').option('preactions', delete_stmt).option('url', REDSHIFT_URL).option('dbtable', load_table).option('tempdir', REDSHIFT_TEMPDIR +'/courses_monthly').mode("append").save()/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)    528             self.format(format)    529if path is None:--&amp;gt; 530self._jwrite.save()    531else:    532             self._jwrite.save(path)/databricks/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py in __call__(self, *args)    931         answer = self.gateway_client.send_command(command)    932         return_value = get_return_value(
--&amp;gt; 933             answer, self.gateway_client, self.target_id, self.name)
    934    935for temp_arg in temp_args:/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)     61def deco(*a,**kw):     62try:---&amp;gt; 63return f(*a,**kw)     64except py4j.protocol.Py4JJavaError as e:     65             s = e.java_exception.toString()/databricks/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)    310                 raise Py4JJavaError(
    311"An error occurred while calling {0}{1}{2}.\n".--&amp;gt; 312                     format(target_id, ".", name), value)
    313else:    314                 raise Py4JError(&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;PRE&gt;&lt;CODE&gt;&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Py4JJavaError: An error occurred while calling o305.save.
: org.apache.spark.SparkException: Job aborted.
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
    at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:511)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
    at com.databricks.spark.redshift.RedshiftWriter.unloadData(RedshiftWriter.scala:278)
    at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:346)
    at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:106)
    at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:443)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:211)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4881 in stage 1.0 failed 4 times, most recent failure: Lost task 4881.3 in stage 1.0 (TID 7305, 10.37.129.129): ExecutorLostFailure (executor 116 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1452)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1440)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1439)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1439)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1665)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1868)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1881)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1901)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:143)
    ... 34 more &lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jan 2017 23:42:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29352#M21086</guid>
      <dc:creator>McKayHarris</dc:creator>
      <dc:date>2017-01-03T23:42:14Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29353#M21087</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hey McKay, there could be bad import data. Databricks would not have changed anything about this job, is it possible someone changed the settings or that there was a spot price spike during the last run?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jan 2017 16:55:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29353#M21087</guid>
      <dc:creator>Bill_Chambers</dc:creator>
      <dc:date>2017-01-04T16:55:45Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29354#M21088</link>
      <description>&lt;P&gt;@Bill Chambers​&amp;nbsp; Hey, the only thing I changed was improving some of the date time logic used to specify which directory to pull the data from. And it has failed the same way across 6-7 runs so it's not spot price spikes, any suggestions on what to try?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jan 2017 18:36:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29354#M21088</guid>
      <dc:creator>McKayHarris</dc:creator>
      <dc:date>2017-01-04T18:36:04Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29355#M21089</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am seeing similar failure for my job. Its frustating no solution so far. Seems like some bug with Databricks as everything works with EMR.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jan 2017 19:42:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29355#M21089</guid>
      <dc:creator>niravshah3</dc:creator>
      <dc:date>2017-01-04T19:42:51Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29356#M21090</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I took a look at your other post about a "similar" problem, and the other issues appears to be library related and incompatible scala versions being used. I'd recommend looking over your configurations and understanding the versions of libraries in use. &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jan 2017 20:16:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29356#M21090</guid>
      <dc:creator>miklos</dc:creator>
      <dc:date>2017-01-04T20:16:25Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29357#M21091</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am getting the same Error as well. My job run for very long and then fails throwing the same error. I am using spark 2.0.0 and EMR 5.0.0. I looked everywhere no solution.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2017 11:45:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29357#M21091</guid>
      <dc:creator>Braj259</dc:creator>
      <dc:date>2017-01-11T11:45:36Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29358#M21092</link>
      <description>&lt;P&gt;I got this resolved removing the caching of dataframes at various stages @McKay Harris​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 31 May 2019 16:43:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29358#M21092</guid>
      <dc:creator>NithinAP</dc:creator>
      <dc:date>2019-05-31T16:43:49Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29359#M21093</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Got similar with Runtime 5.5 LTS with spark 2.4.3 when calling some spacy nlp model &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Sep 2019 08:35:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29359#M21093</guid>
      <dc:creator>AntonBaranau</dc:creator>
      <dc:date>2019-09-18T08:35:36Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29360#M21094</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;We are still facing this issue. Any solution ???&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Sep 2019 11:16:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29360#M21094</guid>
      <dc:creator>shibirajar</dc:creator>
      <dc:date>2019-09-30T11:16:13Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29361#M21095</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Have you tried enabling Apache Arrow in your job? This may improve memory utilization for your job. You can do that by adding this snippet to the top of your script or setting it as part of the Spark config for your job. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;# Enable Arrow-based columnar data transfers
&lt;P&gt;spark.conf.set("spark.sql.execution.arrow.enabled", "true")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;See the docs here: &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/spark-pandas.html#optimizing-conversion-between-spark-and-pandas-dataframes" target="test_blank"&gt;https://docs.databricks.com/spark/latest/spark-sql/spark-pandas.html#optimizing-conversion-between-spark-and-pandas-dataframes&lt;/A&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Oct 2019 11:53:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29361#M21095</guid>
      <dc:creator>RafiKurlansik</dc:creator>
      <dc:date>2019-10-10T11:53:27Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29362#M21096</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I'm having this error too. After months of the model working, I tweaked the data and now I get this "RPC client disconnected probably due to containers exceeding thresholds, bla bla" issue&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Oct 2019 17:18:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29362#M21096</guid>
      <dc:creator>TheodoreVadpey</dc:creator>
      <dc:date>2019-10-29T17:18:09Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29363#M21097</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hello, did you ever find a resolution to this issue?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Oct 2019 17:21:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29363#M21097</guid>
      <dc:creator>TheodoreVadpey</dc:creator>
      <dc:date>2019-10-29T17:21:45Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29364#M21098</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I got the same error when I save my dataframe to S3,&lt;/P&gt;
&lt;P&gt;Although other DataFrame can save successfully.&lt;/P&gt;
&lt;P&gt;I found a method to avoid the problem in my case.&lt;/P&gt;
&lt;P&gt;.&lt;/P&gt;
&lt;P&gt;Define a function to save the DataFrame to hdfs first,&lt;/P&gt;
&lt;P&gt;and then use the saved parquet file create a new DataFrame.&lt;/P&gt;
&lt;P&gt;After this the new DataFrame can save to S3 successfully.&lt;/P&gt;
&lt;P&gt;.&lt;/P&gt;
&lt;P&gt;def save_to_hdfs_first(id, df_save): &lt;/P&gt;
&lt;P&gt;　　df_save.write.mode('overwrite').parquet('/tmp/' + id + '.parquet') &lt;/P&gt;
&lt;P&gt;　　df_new = spark.read.parquet('/tmp/' + id + '.parquet')&lt;/P&gt;
&lt;P&gt;　　return df_new&lt;/P&gt;
&lt;P&gt;.&lt;/P&gt;
&lt;P&gt;I don't know if the memory or the partition problem,&lt;/P&gt;
&lt;P&gt;But this method can indeed solved my problem.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Nov 2019 01:26:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29364#M21098</guid>
      <dc:creator>fisheep</dc:creator>
      <dc:date>2019-11-11T01:26:23Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29365#M21099</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I ran into the same exception using the Data Frame and changing the cluster configuration didn't do any help. I tried some of the suggestions from the above but those didn't either. &lt;/P&gt;
&lt;P&gt;The only way I could get around this use was to create a temporary view from the Data Frame and do a select with only a limited number of results on it. After this I was able to use the entire temporary view without any issues. If I don't do a LIMIT on the number of results I ran into the same issue again even the with the view. Hope this helps someone.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Nov 2019 17:12:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29365#M21099</guid>
      <dc:creator>SatyaD</dc:creator>
      <dc:date>2019-11-19T17:12:40Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29366#M21100</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;According to &lt;A href="https://docs.databricks.com/jobs.html#jar-job-tips" target="test_blank"&gt;https://docs.databricks.com/jobs.html#jar-job-tips&lt;/A&gt;
&lt;P&gt;Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;That was my problem, to "&lt;I&gt;fix it" &lt;/I&gt;I've just set the logging level to ERROR&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;val sc = SparkContext.getOrCreate(conf) &lt;P&gt;&lt;/P&gt;sc.setLogLevel("ERROR")&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;It was solved 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 16:36:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29366#M21100</guid>
      <dc:creator>RodrigoDe_Freit</dc:creator>
      <dc:date>2019-12-10T16:36:36Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29367#M21101</link>
      <description>&lt;P&gt;According to &lt;A href="https://docs.databricks.com/jobs.html#jar-job-tips:" target="test_blank"&gt;https://docs.databricks.com/jobs.html#jar-job-tips:&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;I&gt;"Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed."&lt;/I&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;That was my problem, to "&lt;I&gt;fix it" &lt;/I&gt;I've just set the logging level to ERROR&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;val sc = SparkContext.getOrCreate(conf)&lt;P&gt;&lt;/P&gt;sc.setLogLevel("ERROR")&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;This workaround works for me&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Dec 2019 19:56:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29367#M21101</guid>
      <dc:creator>RodrigoDe_Freit</dc:creator>
      <dc:date>2019-12-10T19:56:17Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29368#M21102</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am facing the same error but the log output to stdout is not an issue as the log file size turns out to be &amp;lt; 2 MB. So that issue is ruled out. Moreover, our job is dummy for testing purposes and is not doing any memory intensive operation. Its purely running a simple thread that keeps on logging to the stdout every 5 mins. &lt;/P&gt;
&lt;P&gt;Still the cluster is getting timed out. &lt;/P&gt;
&lt;P&gt;Below is the post i have submitted on stack overflow.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/59820940/databricks-job-timed-out-with-error-lost-executor-0-on-ip-remote-rpc-client" target="test_blank"&gt;https://stackoverflow.com/questions/59820940/databricks-job-timed-out-with-error-lost-executor-0-on-ip-remote-rpc-client&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jan 2020 10:20:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29368#M21102</guid>
      <dc:creator>ankurmalhotra89</dc:creator>
      <dc:date>2020-01-20T10:20:48Z</dc:date>
    </item>
    <item>
      <title>Re: ExecutorLostFailure: Remote RPC Client Disassociated</title>
      <link>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29369#M21103</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;We did this same "chopping off the DAG" approach some time ago amid S3 writing problems, and apparently I need to revive it again. Have you tried .checkpoint() instead?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2021 14:51:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/executorlostfailure-remote-rpc-client-disassociated/m-p/29369#M21103</guid>
      <dc:creator>WarrenStephens</dc:creator>
      <dc:date>2021-02-04T14:51:47Z</dc:date>
    </item>
  </channel>
</rss>

