<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unable to read data from ElasticSearch using Databricks (AWS)
Cannot detect ES version - 
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [IP:PORT] in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-using-databricks-aws/m-p/3761#M691</link>
    <description>&lt;P&gt;I have the same problem, did you find any solution? thanks&lt;/P&gt;</description>
    <pubDate>Tue, 06 Jun 2023 19:42:20 GMT</pubDate>
    <dc:creator>Hoviedo</dc:creator>
    <dc:date>2023-06-06T19:42:20Z</dc:date>
    <item>
      <title>Unable to read data from ElasticSearch using Databricks (AWS)
Cannot detect ES version - 
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [IP:PORT]</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-using-databricks-aws/m-p/3760#M690</link>
      <description>&lt;P&gt;I am trying to read data from ElasticSearch(ES Version 8.5.2) using PySpark on Databricks (13.0 (includes Apache Spark 3.4.0, Scala 2.12)). The ecosystem is on AWS.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am able to run a curl command on the Databricks notebook to the ES ip:port and fetch the data. (Which tells me the access is available ) &lt;/P&gt;&lt;P&gt;But, unable to do the read the same ES through PySpark.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Below is the code &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Jars &lt;/P&gt;&lt;P&gt;org.elasticsearch:elasticsearch-spark-30_2.12:8.5.2&lt;/P&gt;&lt;P&gt;org.elasticsearch:elasticsearch-hadoop:8.5.2&lt;/P&gt;&lt;P&gt;------------------&lt;/P&gt;&lt;P&gt;df = (spark.read&lt;/P&gt;&lt;P&gt; .format("org.elasticsearch.spark.sql" )&lt;/P&gt;&lt;P&gt; .option("spark.es.nodes.wan.only","true" )&lt;/P&gt;&lt;P&gt; .option("spark.es.nodes","&lt;A href="https://es01-nonprod.avrioc.io" alt="https://es01-nonprod.avrioc.io" target="_blank"&gt;es01-nonprod.office.io&lt;/A&gt;" )&lt;/P&gt;&lt;P&gt; #.option("es.net.ssl", "true")&lt;/P&gt;&lt;P&gt; .option("spark.es.net.http.auth.user", username)&lt;/P&gt;&lt;P&gt; .option("spark.es.net.http.auth.pass", password)&lt;/P&gt;&lt;P&gt; .option("spark.es.port",port)&lt;/P&gt;&lt;P&gt; #.option("es.net.ssl.protocol", "https")&lt;/P&gt;&lt;P&gt; .option("spark.es.nodes.discovery", "false")&lt;/P&gt;&lt;P&gt; #.option("es.nodes.client.only", "false")&lt;/P&gt;&lt;P&gt; #.option("spark.es.scheme", "https")&lt;/P&gt;&lt;P&gt; #.option("spark.serializer", "org.apache.spark.serializer.KryoSerializer")&lt;/P&gt;&lt;P&gt; #.option("spark.es.http.timeout", "10m")&lt;/P&gt;&lt;P&gt; #.option("es.net.ssl.keystore.type","CRT")&lt;/P&gt;&lt;P&gt; #.option("es.net.ssl.truststore.location","/etc/ssl/certs/ca-certificates.crt")&lt;/P&gt;&lt;P&gt; .load( f"{index}" )&lt;/P&gt;&lt;P&gt; )&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;display(df)&lt;/P&gt;&lt;P&gt;----------------&lt;/P&gt;&lt;P&gt;Error screenshot &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="ErrorScreenshot"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/133iC6A7DC5DAC1A8CC4/image-size/large?v=v2&amp;amp;px=999" role="button" title="ErrorScreenshot" alt="ErrorScreenshot" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Curl command works just fine&lt;span class="lia-inline-image-display-wrapper" image-alt="Screenshot 2023-06-01 at 1.25.29 PM"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/137i3A375D03B6EFAA9F/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot 2023-06-01 at 1.25.29 PM" alt="Screenshot 2023-06-01 at 1.25.29 PM" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;﻿I've tried &lt;/P&gt;&lt;P&gt;adding all the spark configurations during the cluster creation.&lt;/P&gt;&lt;P&gt;changing jars to org.elasticsearch:elasticsearch-hadoop:8.5.2 &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Resolution will be appreciated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Jun 2023 09:22:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-using-databricks-aws/m-p/3760#M690</guid>
      <dc:creator>naveenprabhun</dc:creator>
      <dc:date>2023-06-01T09:22:22Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read data from ElasticSearch using Databricks (AWS)
Cannot detect ES version - 
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [IP:PORT]</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-using-databricks-aws/m-p/3761#M691</link>
      <description>&lt;P&gt;I have the same problem, did you find any solution? thanks&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jun 2023 19:42:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-using-databricks-aws/m-p/3761#M691</guid>
      <dc:creator>Hoviedo</dc:creator>
      <dc:date>2023-06-06T19:42:20Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read data from ElasticSearch using Databricks (AWS)
Cannot detect ES version - 
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [IP:PORT]</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-using-databricks-aws/m-p/3762#M692</link>
      <description>&lt;P&gt;You can try adding the certificates into a trust-store and storing on the cluster.  Then provide the truststore path in spark &lt;B&gt;es.net.ssl.keystore.location&lt;/B&gt; and&amp;nbsp;&lt;B&gt;es.net.ssl.truststore.location&lt;/B&gt;&amp;nbsp;parameters&lt;/P&gt;</description>
      <pubDate>Wed, 07 Jun 2023 05:03:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-using-databricks-aws/m-p/3762#M692</guid>
      <dc:creator>naveenprabhun</dc:creator>
      <dc:date>2023-06-07T05:03:20Z</dc:date>
    </item>
  </channel>
</rss>

