cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What is "ExecuteGrpcResponseSender: Deadline reached, shutting down stream"

Brad
Contributor II

 

Hi, 

I have a delta table which is loaded by structured streaming job. When I tried to read this delta table and do a MERGE with foreachBatch, I found sometimes there is a big interval between streaming starts and MERGE starting to run and seems spark is waiting for something. From log I can see 

 

 

INFO ExecuteGrpcResponseSender: Starting for opId=5ef071b7-xxx, reattachable=true, lastConsumedStreamIndex=0
...
INFO SessionHolder: Session SessionKey(69xxx,04470efa-xxxx) accessed, time 1728792222507.
...
INFO ExecuteGrpcResponseSender: Deadline reached, shutting down stream for opId=5ef071b7-xxx after index 0. totalTime=120001284340ns waitingForResults=120001197790ns waitingForSend=0ns
INFO SessionHolder: Session SessionKey(69xxx,04470efa-xxxx) accessed, time 1728792342527.
INFO ExecuteGrpcResponseSender: Starting for opId=5ef071b7-xxx, reattachable=true, lastConsumedStreamIndex=0
...

 

 

there are many "INFO ExecuteGrpcResponseSender: Deadline reached, shutting down stream..." and seems something is time out after 120s. I tried to set 

 

spark.network.timeout: 800s
spark.streaming.backpressure.enabled: true

 

but still can find those deadline info in log. 
What happened here? Is there some config I can make to remove this as seems it slows down the job.

Thanks

3 REPLIES 3

NandiniN
Databricks Employee
Databricks Employee

We need to understand, why upstream of the repl cancelled the request. It could be resource exhaustion. Do you see "java.lang.OutOfMemoryError"? 

I saw https://issues.apache.org/jira/browse/SPARK-49492 to be the cause of such an error in one of the past issues.

Do you regularly see this issue, or is intermittent? Restart of the cluster will cause the issue to be mitigated, but to get review the logs you may have to enable cluster log delivery to investigate further.

Brad
Contributor II

This might be a bug. The issue is gone if I change the cluster from shared mode to single user mode

NandiniN
Databricks Employee
Databricks Employee

It may not necessarily be a bug, but some tuning due to architectural differences.

What the message says is:

  • The system was processing a gRPC operation identified by opId=5ef071b7-xxx, and it set a deadline for that operation (likely 120 seconds).
  • The operation didn't complete in time and exceeded the deadline, so the system has shut down the stream and stopped waiting for further results.
  • The operation spent almost all of its time (around 120 seconds) waiting for results and did not spend any time in the process of sending data back to the client.
  • It is an INFO message, indicating an event.
  • Shared mode uses spark connect underlying, but single user mode does not and hence we do not see the logs in single user cluster.

However, as our next steps:

 

  • We can try to understand the cause of the delay on the external resource or service where the request is sent. 
  • It is possible the timeouts are not same, or needs to be bumped up.
  • Are there any other errors that you see along with these messages?

 

However, yes it may need more indepth look.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group