cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Kafka timout

kwasi
New Contributor II

Hello, 
I am trying to read topics from a kafaka stream but I am getting the time out error below.

java.util.concurrent.ExecutionException: kafkashaded.org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeTopics


23/09/05 18:30:52 INFO NetworkClient: [AdminClient clientId=Databricks] Disconnecting from node 4 due to socket connection setup timeout. The timeout value is 11054 ms.

I can ping the kafka broker from databricks, the error seems to occour when I try to grab data.
Example code.

 

inputDF = (spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", kafka_broker)
.option("kafka.ssl.endpoint.identification.algorithm", "https")
.option("kafka.sasl.mechanism", "PLAIN")
.option("kafka.security.protocol", "SASL_SSL")
.option("kafka.sasl.jaas.config", "kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username='{}' password='{}';".format("123", "456"))
.option("subscribe", topic)
.option("spark.streaming.kafka.maxRatePerPartition", "5")
.option("startingOffsets", "earliest")
.option("kafka.session.timeout.ms", "10000")
.load() )
display(inputDF)

 

Does anyone have any inkling as to why this might be happening?

8 REPLIES 8

Kaniz_Fatma
Community Manager
Community Manager

Hi @kwasi

• The issue with Kafka timing out when reading from a Kafka stream in Databricks could be due to network issues, configuration issues, or Kafka server overload.
• To address the issue, check network connectivity, check Kafka and Databricks configurations, and adjust Kafka timeout settings.
• To check network connectivity, use network diagnostic tools to check for packet loss and latency.
• To check Kafka and Databricks configurations, ensure the Kafka bootstrap server runs at the correct hostname or IP address, and the Kafka server is accessible.
• To adjust Kafka timeout settings, increase the timeout value in the Kafka configuration.

kwasi
New Contributor II

@Kaniz_Fatma  Thanks for the reply.
• I dont seem to have a problem with connection, running 

%sh nc -zv xxx.aws.confluent.cloud 9092 results in 
Connection to xxx.aws.confluent.cloud (xx.xx.xxx.xx) 9092 port [tcp/*] succeeded!

But the timeout happens when I try to actually retrieve data using spark, as shown in the sample code above.
i.e after, is there anything else that I am overlooking?
display(inputDF)




Kaniz_Fatma
Community Manager
Community Manager

Hi @kwasi , 

Check Spark UI for input events from the source
• Check processing time on Spark UI
• Check batch details in the 'Completed Batches’ section
• Check thread dump in Spark UI for hanging or slow-running tasks

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@kwasi 

As we can see from the error, the failure is happening during DescribeTopics. You can check with the Kafka team to see if the brokers are communicating fine with the controller. It is timing out while trying to communicate with the nodes. 

Getting the broker logs will help us.

Murthy1
Contributor II

@kwasi -- were you able to fix this? I am facing this issue now and any help / leads would greatly help me out 🙂

NandiniN
Honored Contributor
Honored Contributor

Hi @Murthy1 ,

Are you able to connect to Kafka from Databricks, and are the brokers healthy? The error indicates Databricks is unable to connect to Kafka cluster, possibly due to network issues or incorrect configuration.

We can try nc command from a notebook to validate the connectivity.

Thanks!

Hello @NandiniN .. Thanks for responding! Yes I am able to connect the confluent cloud (Kafka) from the notebook through nc command. I am facing the error when I try to do df.show() as same as @kwasi . Any help here will be appreciated ! 

NandiniN
Honored Contributor
Honored Contributor

Hi @Murthy1

Is this an intermittent issue or you are regularly facing this. The issue is while fetching the topic-level metadata.

I checked internally on this, it is possible it can be a network issue. We may have to do a deeper dive on this issue. 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group