cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to read data from ElasticSearch using Databricks (AWS) Cannot detect ES version - Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [IP:PORT]

naveenprabhun
New Contributor III

I am trying to read data from ElasticSearch(ES Version 8.5.2) using PySpark on Databricks (13.0 (includes Apache Spark 3.4.0, Scala 2.12)). The ecosystem is on AWS.

I am able to run a curl command on the Databricks notebook to the ES ip:port and fetch the data. (Which tells me the access is available )

But, unable to do the read the same ES through PySpark.

Below is the code

Jars

org.elasticsearch:elasticsearch-spark-30_2.12:8.5.2

org.elasticsearch:elasticsearch-hadoop:8.5.2

------------------

df = (spark.read

.format("org.elasticsearch.spark.sql" )

.option("spark.es.nodes.wan.only","true" )

.option("spark.es.nodes","es01-nonprod.office.io" )

#.option("es.net.ssl", "true")

.option("spark.es.net.http.auth.user", username)

.option("spark.es.net.http.auth.pass", password)

.option("spark.es.port",port)

#.option("es.net.ssl.protocol", "https")

.option("spark.es.nodes.discovery", "false")

#.option("es.nodes.client.only", "false")

#.option("spark.es.scheme", "https")

#.option("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

#.option("spark.es.http.timeout", "10m")

#.option("es.net.ssl.keystore.type","CRT")

#.option("es.net.ssl.truststore.location","/etc/ssl/certs/ca-certificates.crt")

.load( f"{index}" )

)

display(df)

----------------

Error screenshot

ErrorScreenshot 

Curl command works just fineScreenshot 2023-06-01 at 1.25.29 PM 

I've tried

adding all the spark configurations during the cluster creation.

changing jars to org.elasticsearch:elasticsearch-hadoop:8.5.2

Resolution will be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

You can try adding the certificates into a trust-store and storing on the cluster. Then provide the truststore path in spark es.net.ssl.keystore.location and es.net.ssl.truststore.location parameters

View solution in original post

2 REPLIES 2

Hoviedo
New Contributor II

I have the same problem, did you find any solution? thanks

You can try adding the certificates into a trust-store and storing on the cluster. Then provide the truststore path in spark es.net.ssl.keystore.location and es.net.ssl.truststore.location parameters

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.