Databricks Community

naveenprabhun · ‎06-01-2023

I am trying to read data from ElasticSearch(ES Version 8.5.2) using PySpark on Databricks (13.0 (includes Apache Spark 3.4.0, Scala 2.12)). The ecosystem is on AWS.

I am able to run a curl command on the Databricks notebook to the ES ip:port and fetch the data. (Which tells me the access is available )

But, unable to do the read the same ES through PySpark.

Below is the code

Jars

org.elasticsearch:elasticsearch-spark-30_2.12:8.5.2

org.elasticsearch:elasticsearch-hadoop:8.5.2

------------------

df = (spark.read

.format("org.elasticsearch.spark.sql" )

.option("spark.es.nodes.wan.only","true" )

.option("spark.es.nodes","es01-nonprod.office.io" )

#.option("es.net.ssl", "true")

.option("spark.es.net.http.auth.user", username)

.option("spark.es.net.http.auth.pass", password)

.option("spark.es.port",port)

#.option("es.net.ssl.protocol", "https")

.option("spark.es.nodes.discovery", "false")

#.option("es.nodes.client.only", "false")

#.option("spark.es.scheme", "https")

#.option("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

#.option("spark.es.http.timeout", "10m")

#.option("es.net.ssl.keystore.type","CRT")

#.option("es.net.ssl.truststore.location","/etc/ssl/certs/ca-certificates.crt")

.load( f"{index}" )

)

display(df)

----------------

Error screenshot

Curl command works just fine

I've tried

adding all the spark configurations during the cluster creation.

changing jars to org.elasticsearch:elasticsearch-hadoop:8.5.2

Resolution will be appreciated.

naveenprabhun · ‎06-06-2023

You can try adding the certificates into a trust-store and storing on the cluster. Then provide the truststore path in spark es.net.ssl.keystore.location and es.net.ssl.truststore.location parameters

View solution in original post

Hoviedo · ‎06-06-2023

I have the same problem, did you find any solution? thanks

naveenprabhun · ‎06-06-2023

You can try adding the certificates into a trust-store and storing on the cluster. Then provide the truststore path in spark es.net.ssl.keystore.location and es.net.ssl.truststore.location parameters