09-05-2023 12:04 PM
Hello,
I am trying to read topics from a kafaka stream but I am getting the time out error below.
23/09/05 18:30:52 INFO NetworkClient: [AdminClient clientId=Databricks] Disconnecting from node 4 due to socket connection setup timeout. The timeout value is 11054 ms.
I can ping the kafka broker from databricks, the error seems to occour when I try to grab data.
Example code.
inputDF = (spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", kafka_broker)
.option("kafka.ssl.endpoint.identification.algorithm", "https")
.option("kafka.sasl.mechanism", "PLAIN")
.option("kafka.security.protocol", "SASL_SSL")
.option("kafka.sasl.jaas.config", "kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username='{}' password='{}';".format("123", "456"))
.option("subscribe", topic)
.option("spark.streaming.kafka.maxRatePerPartition", "5")
.option("startingOffsets", "earliest")
.option("kafka.session.timeout.ms", "10000")
.load() )
display(inputDF)
Does anyone have any inkling as to why this might be happening?
09-05-2023 01:08 PM
Hi @kwasi,
• The issue with Kafka timing out when reading from a Kafka stream in Databricks could be due to network issues, configuration issues, or Kafka server overload.
• To address the issue, check network connectivity, check Kafka and Databricks configurations, and adjust Kafka timeout settings.
• To check network connectivity, use network diagnostic tools to check for packet loss and latency.
• To check Kafka and Databricks configurations, ensure the Kafka bootstrap server runs at the correct hostname or IP address, and the Kafka server is accessible.
• To adjust Kafka timeout settings, increase the timeout value in the Kafka configuration.
09-06-2023 06:59 AM
@Kaniz_Fatma Thanks for the reply.
• I dont seem to have a problem with connection, running
display(inputDF)
09-06-2023 07:12 AM
Hi @kwasi ,
Check Spark UI for input events from the source
• Check processing time on Spark UI
• Check batch details in the 'Completed Batches’ section
• Check thread dump in Spark UI for hanging or slow-running tasks
09-06-2023 10:11 PM
As we can see from the error, the failure is happening during DescribeTopics. You can check with the Kafka team to see if the brokers are communicating fine with the controller. It is timing out while trying to communicate with the nodes.
Getting the broker logs will help us.
05-01-2024 03:21 AM
@kwasi -- were you able to fix this? I am facing this issue now and any help / leads would greatly help me out 🙂
05-01-2024 03:39 AM
Hi @Murthy1 ,
Are you able to connect to Kafka from Databricks, and are the brokers healthy? The error indicates Databricks is unable to connect to Kafka cluster, possibly due to network issues or incorrect configuration.
We can try nc command from a notebook to validate the connectivity.
Thanks!
05-01-2024 05:51 AM
05-21-2024 02:54 AM
Hi @Murthy1,
Is this an intermittent issue or you are regularly facing this. The issue is while fetching the topic-level metadata.
I checked internally on this, it is possible it can be a network issue. We may have to do a deeper dive on this issue.
2 weeks ago
What event hub namespace you were using?
I had same problem and resolved by changing pricing plan from basic to standard as Kafka apps is not supporting in basic plan
Let me know if you had anything else. Thanks
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group