<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Error:   TransportResponseHandler: Still have 1 requests outstanding when connection, occurring only on large dataset. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29287#M21031</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have also encountered this problem, it is estimated that when shuffle the network bandwidth reaches the limit and timeout. I solved this problem successfully by reducing the number of executor&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 21 Nov 2017 06:29:50 GMT</pubDate>
    <dc:creator>gang_liugang_li</dc:creator>
    <dc:date>2017-11-21T06:29:50Z</dc:date>
    <item>
      <title>Error:   TransportResponseHandler: Still have 1 requests outstanding when connection, occurring only on large dataset.</title>
      <link>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29285#M21029</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am getting below error only during large dataset(i.e 15 TB compressed) . if my dataset is small( 1TB) i am not getting this error.&lt;/P&gt;
&lt;P&gt;Look like it fails on shuffle stage. Approx number of mappers is 150,000&lt;/P&gt;
&lt;P&gt;&lt;B&gt; Spark config:&lt;/B&gt;&lt;/P&gt;spark.sql.warehouse.dir hdfs:///user/spark/warehouse
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;spark.yarn.dist.files file:/etc/spark/conf/hive-site.xml&lt;/P&gt; 
&lt;P&gt;spark.executor.extraJavaOptions -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'&lt;/P&gt; 
&lt;P&gt;spark.driver.host 172.20.103.94&lt;/P&gt; 
&lt;P&gt;spark.history.fs.logDirectory hdfs:///var/log/spark/apps&lt;/P&gt; 
&lt;P&gt;spark.eventLog.enabled true&lt;/P&gt; 
&lt;P&gt;spark.ui.port 0&lt;/P&gt; 
&lt;P&gt;spark.driver.port 35246&lt;/P&gt; 
&lt;P&gt;spark.shuffle.service.enabled true&lt;/P&gt; 
&lt;P&gt;spark.driver.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native&lt;/P&gt; 
&lt;P&gt;spark.yarn.historyServer.address ip-172-20-99-29.ec2.internal:18080&lt;/P&gt; 
&lt;P&gt;spark.yarn.app.id application_1486842541319_0002&lt;/P&gt; 
&lt;P&gt;spark.scheduler.mode FIFO&lt;/P&gt; 
&lt;P&gt;spark.driver.memory 10g&lt;/P&gt; 
&lt;P&gt;spark.executor.id driver&lt;/P&gt; 
&lt;P&gt;spark.yarn.app.container.log.dir /var/log/hadoop-yarn/containers/application_1486842541319_0002/container_1486842541319_0002_01_000001&lt;/P&gt; 
&lt;P&gt;spark.driver.extraJavaOptions -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'&lt;/P&gt; 
&lt;P&gt;spark.submit.deployMode cluster&lt;/P&gt; 
&lt;P&gt;spark.master yarn&lt;/P&gt; 
&lt;P&gt;spark.ui.filters &lt;/P&gt; 
&lt;P&gt;org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter&lt;/P&gt; 
&lt;P&gt;spark.executor.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native&lt;/P&gt; 
&lt;P&gt;spark.sql.hive.metastore.sharedPrefixes com.amazonaws.services.dynamodbv2&lt;/P&gt; 
&lt;P&gt;spark.executor.memory 5120M&lt;/P&gt; 
&lt;P&gt;spark.driver.extraClassPath /usr/lib/hadoop-lzo/lib/&lt;I&gt;:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/&lt;/I&gt;:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/&lt;I&gt;:/usr/share/aws/emr/emrfs/auxlib/&lt;/I&gt;:/usr/share/aws/ emr/security/conf:/usr/share/aws/emr/security/lib/*&lt;/P&gt; 
&lt;P&gt;spark.eventLog.dir hdfs:///var/log/spark/apps&lt;/P&gt; 
&lt;P&gt;spark.dynamicAllocation.enabled true&lt;/P&gt; 
&lt;P&gt;spark.executor.extraClassPath /usr/lib/hadoop-lzo/lib/&lt;I&gt;:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/&lt;/I&gt;:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/&lt;I&gt;:/usr/share/aws/emr/emrfs/auxlib/&lt;/I&gt;:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*&lt;/P&gt; 
&lt;P&gt;spark.executor.cores 8&lt;/P&gt; 
&lt;P&gt;spark.history.ui.port 18080&lt;/P&gt; 
&lt;P&gt;spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS ip-172-20-99-29.ec2.internal&lt;/P&gt; 
&lt;P&gt;spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES &lt;A href="http://ip-172-20-99-29.ec2.internal:20888/proxy/application_1486842541319_0002" target="test_blank"&gt;http://ip-172-20-99-29.ec2.internal:20888/proxy/application_1486842541319_0002&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;spark.app.id application_1486842541319_0002&lt;/P&gt; 
&lt;P&gt;spark.hadoop.yarn.timeline-service.enabled false&lt;/P&gt; 
&lt;P&gt;spark.sql.shuffle.partitions 10000 &lt;/P&gt;
&lt;P&gt;&lt;B&gt;Error Trace:&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;17/02/11 22:01:05 INFO ShuffleBlockFetcherIterator: Started 29 remote fetches in 2700 ms&lt;/P&gt;
&lt;P&gt;17/02/11 22:03:04 ERROR TransportChannelHandler: Connection to ip-172-20-96-109.ec2.internal/172.20.96.109:7337 has been quiet for 120000 ms while there are outstanding requests. Assuming connection is dead; please adjust spark.network.timeout if this is wrong.&lt;/P&gt;
&lt;P&gt;17/02/11 22:03:04 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from ip-172-20-96-109.ec2.internal/172.20.96.109:7337 is closed&lt;/P&gt;
&lt;P&gt;17/02/11 22:03:04 ERROR OneForOneBlockFetcher: Failed while starting block fetches&lt;/P&gt;
&lt;P&gt;java.io.IOException: Connection from ip-172-20-96-109.ec2.internal/172.20.96.109:7337 closed&lt;/P&gt;
&lt;P&gt; at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:128)&lt;/P&gt;
&lt;P&gt; at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:109)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:251)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:230)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)&lt;/P&gt;
&lt;P&gt; at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:257)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:251)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:230)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:251)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:230)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)&lt;/P&gt;
&lt;P&gt; at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:182)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:251)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:230)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:251)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691)&lt;/P&gt;
&lt;P&gt; at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)&lt;/P&gt;
&lt;P&gt; at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)&lt;/P&gt;
&lt;P&gt; at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)&lt;/P&gt;
&lt;P&gt; at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)&lt;/P&gt;
&lt;P&gt;at java.lang.Thread.run(Thread.java:745)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Feb 2017 01:34:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29285#M21029</guid>
      <dc:creator>SatheesshChinnu</dc:creator>
      <dc:date>2017-02-12T01:34:17Z</dc:date>
    </item>
    <item>
      <title>Re: Error:   TransportResponseHandler: Still have 1 requests outstanding when connection, occurring only on large dataset.</title>
      <link>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29286#M21030</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have increased my timeout to 1200s ( i.e spark.network.timeout=1200s) . Still I am getting netty error. This time error occurs on block replication.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;17/02/24 09:10:21 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from ip-172-20-101-120.ec2.internal/172.20.101.120:46113 is closed
17/02/24 09:10:21 ERROR NettyBlockTransferService: Error while uploading block rdd_24_2312
java.io.IOException: Connection from ip-172-20-101-120.ec2.internal/172.20.101.120:46113 closed&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Feb 2017 09:27:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29286#M21030</guid>
      <dc:creator>SatheesshChinnu</dc:creator>
      <dc:date>2017-02-24T09:27:46Z</dc:date>
    </item>
    <item>
      <title>Re: Error:   TransportResponseHandler: Still have 1 requests outstanding when connection, occurring only on large dataset.</title>
      <link>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29287#M21031</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have also encountered this problem, it is estimated that when shuffle the network bandwidth reaches the limit and timeout. I solved this problem successfully by reducing the number of executor&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Nov 2017 06:29:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29287#M21031</guid>
      <dc:creator>gang_liugang_li</dc:creator>
      <dc:date>2017-11-21T06:29:50Z</dc:date>
    </item>
    <item>
      <title>Re: Error:   TransportResponseHandler: Still have 1 requests outstanding when connection, occurring only on large dataset.</title>
      <link>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29288#M21032</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am facing the same issue , I am doing shuffle using group by key operation with very few connectors and facing the connection closed issue from one of the nodes. &lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 May 2018 23:19:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29288#M21032</guid>
      <dc:creator>srikanthvvgs</dc:creator>
      <dc:date>2018-05-01T23:19:13Z</dc:date>
    </item>
    <item>
      <title>Re: Error:   TransportResponseHandler: Still have 1 requests outstanding when connection, occurring only on large dataset.</title>
      <link>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29289#M21033</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@Satheessh Chinnusamy how did you solve the above issue&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Sep 2018 09:20:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-transportresponsehandler-still-have-1-requests-outstanding/m-p/29289#M21033</guid>
      <dc:creator>parikshitbhoyar</dc:creator>
      <dc:date>2018-09-03T09:20:26Z</dc:date>
    </item>
  </channel>
</rss>

