Re: Unable to read data from Elasticsearch with sp...

AmanSehgal · ‎07-21-2022

How are you reading data from Elastic search?

Are you exporting data from ES in JSON or CSV format and then reading it via Spark or directly connecting to ES?

If you're connecting directly, then you can use following snippet:

df = (spark.read
      .format( "org.elasticsearch.spark.sql" )
      .option( "es.nodes",   hostname )
      .option( "es.port",    port     )
      .option( "es.net.ssl", ssl      )
      .option( "es.nodes.wan.only", "true" )
      .load( f"index/{index}" )
     )
 
display(df)

If you're exporting in say JSON format using elastic dump service then use the following code snippet:

df = spark.read.json("<dbfs_path>/*.json").select("_id","_source.*")

This is because your file is exported as follows:

_id:string
_index:string
_score:long
_source:struct
         col_1:<data_type>
         col_2:<data_type>
         col_3:<data_type>
        col_4:<data_type>
        col_n:<data_type>

All your columns are nested inside _source.

Hope this helps.