cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to read data from ElasticSearch using Databricks (AWS) Cannot detect ES version - Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [IP:PORT]

naveenprabhun
New Contributor III

I am trying to read data from ElasticSearch(ES Version 8.5.2) using PySpark on Databricks (13.0 (includes Apache Spark 3.4.0, Scala 2.12)). The ecosystem is on AWS.

I am able to run a curl command on the Databricks notebook to the ES ip:port and fetch the data. (Which tells me the access is available )

But, unable to do the read the same ES through PySpark.

Below is the code

Jars

org.elasticsearch:elasticsearch-spark-30_2.12:8.5.2

org.elasticsearch:elasticsearch-hadoop:8.5.2

------------------

df = (spark.read

.format("org.elasticsearch.spark.sql" )

.option("spark.es.nodes.wan.only","true" )

.option("spark.es.nodes","es01-nonprod.office.io" )

#.option("es.net.ssl", "true")

.option("spark.es.net.http.auth.user", username)

.option("spark.es.net.http.auth.pass", password)

.option("spark.es.port",port)

#.option("es.net.ssl.protocol", "https")

.option("spark.es.nodes.discovery", "false")

#.option("es.nodes.client.only", "false")

#.option("spark.es.scheme", "https")

#.option("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

#.option("spark.es.http.timeout", "10m")

#.option("es.net.ssl.keystore.type","CRT")

#.option("es.net.ssl.truststore.location","/etc/ssl/certs/ca-certificates.crt")

.load( f"{index}" )

)

display(df)

----------------

Error screenshot

ErrorScreenshot 

Curl command works just fineScreenshot 2023-06-01 at 1.25.29 PM 

I've tried

adding all the spark configurations during the cluster creation.

changing jars to org.elasticsearch:elasticsearch-hadoop:8.5.2

Resolution will be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

You can try adding the certificates into a trust-store and storing on the cluster. Then provide the truststore path in spark es.net.ssl.keystore.location and es.net.ssl.truststore.location parameters

View solution in original post

2 REPLIES 2

Hoviedo
New Contributor III

I have the same problem, did you find any solution? thanks

You can try adding the certificates into a trust-store and storing on the cluster. Then provide the truststore path in spark es.net.ssl.keystore.location and es.net.ssl.truststore.location parameters

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group