<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unable to read data from Elasticsearch with spark in Databricks. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12957#M7701</link>
    <description>&lt;P&gt;When I am trying to read data from elasticsearch by spark sql, it throw an error like &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Caused by: RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It show like schema generated with spark is not matching with data received from elasticsearch.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you let know how I can read the data from elastic via either csv, or excel format?&lt;/P&gt;</description>
    <pubDate>Thu, 21 Jul 2022 04:18:19 GMT</pubDate>
    <dc:creator>Data_Engineer3</dc:creator>
    <dc:date>2022-07-21T04:18:19Z</dc:date>
    <item>
      <title>Unable to read data from Elasticsearch with spark in Databricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12957#M7701</link>
      <description>&lt;P&gt;When I am trying to read data from elasticsearch by spark sql, it throw an error like &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Caused by: RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It show like schema generated with spark is not matching with data received from elasticsearch.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you let know how I can read the data from elastic via either csv, or excel format?&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2022 04:18:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12957#M7701</guid>
      <dc:creator>Data_Engineer3</dc:creator>
      <dc:date>2022-07-21T04:18:19Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read data from Elasticsearch with spark in Databricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12958#M7702</link>
      <description>&lt;P&gt;How are you reading data from Elastic search?&lt;/P&gt;&lt;P&gt;Are you exporting data from ES in JSON or CSV format and then reading it via Spark or directly connecting to ES?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you're connecting directly, then you can use following snippet:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = (spark.read
      .format( "org.elasticsearch.spark.sql" )
      .option( "es.nodes",   hostname )
      .option( "es.port",    port     )
      .option( "es.net.ssl", ssl      )
      .option( "es.nodes.wan.only", "true" )
      .load( f"index/{index}" )
     )
&amp;nbsp;
display(df)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;If you're exporting in say JSON format using elastic dump service then use the following code snippet:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = spark.read.json("&amp;lt;dbfs_path&amp;gt;/*.json").select("_id","_source.*")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This is because your file is exported as follows:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;_id:string
_index:string
_score:long
_source:struct
         col_1:&amp;lt;data_type&amp;gt;
         col_2:&amp;lt;data_type&amp;gt;
         col_3:&amp;lt;data_type&amp;gt;
        col_4:&amp;lt;data_type&amp;gt;
        col_n:&amp;lt;data_type&amp;gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt; All your columns are nested inside _source.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2022 10:33:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12958#M7702</guid>
      <dc:creator>AmanSehgal</dc:creator>
      <dc:date>2022-07-21T10:33:37Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read data from Elasticsearch with spark in Databricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12959#M7703</link>
      <description>&lt;P&gt;Hi @Aman Sehgal​&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am trying to read elastic data by directly connect to it.&lt;/P&gt;&lt;P&gt;I am using below snippet &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df = spark.read.format("org.elasticsearch.spark.sql")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.read.metadata", "false")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("spark.es.nodes.discovery", "true")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.net.ssl", "false")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.index.auto.create", "true")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.field.read.empty.as.null", "no")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.read.field.as.array.exclude","true")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("spark.serializer", "org.apache.spark.serializer.KryoSerializer")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.nodes", "*")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.nodes.wan.only", "true")&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.net.http.auth.user", elasticUsername)&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.net.http.auth.pass", elasticPassword)&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option("es.resource", "indexname")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But I am getting runtime error showing that&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;RuntimeException: Error while encoding: java.lang.RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Caused by: RuntimeException: scala.collection.convert.Wrappers$JListWrapper is not a valid external type for schema of string&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;do you have solution to it?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note: i think error due to schema getting generated by spark is not matching with schema present in elastic.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2022 14:45:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12959#M7703</guid>
      <dc:creator>Data_Engineer3</dc:creator>
      <dc:date>2022-07-21T14:45:12Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read data from Elasticsearch with spark in Databricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12960#M7704</link>
      <description>&lt;P&gt;I believe this could be a known bug reported on the Elasticsearch Spark connector for Spark 3.0.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This connector is maintained by the Open source community and we don't have any ETA on the fix yet.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Bug details:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/elastic/elasticsearch-hadoop/issues/1635" alt="https://github.com/elastic/elasticsearch-hadoop/issues/1635" target="_blank"&gt;https://github.com/elastic/elasticsearch-hadoop/issues/1635&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can look for the latest connector to support Spark3.0 in Maven repo.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What is the DBR version that you are using for the cluster?&lt;/P&gt;</description>
      <pubDate>Sun, 24 Jul 2022 20:44:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12960#M7704</guid>
      <dc:creator>Prabakar</dc:creator>
      <dc:date>2022-07-24T20:44:31Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to read data from Elasticsearch with spark in Databricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12961#M7705</link>
      <description>&lt;P&gt;Hi there @KARTHICK N​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Sep 2022 12:16:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unable-to-read-data-from-elasticsearch-with-spark-in-databricks/m-p/12961#M7705</guid>
      <dc:creator>Vidula</dc:creator>
      <dc:date>2022-09-05T12:16:37Z</dc:date>
    </item>
  </channel>
</rss>

