Hi Databricks community,
I'm facing a challenge extracting JSON data from Elasticsearch in Azure Databricks efficiently, maintaining header information.
Previously, I had to use RDDs for parallel extraction, but they're no longer supported in Databricks. This forced me to switch to serialized extraction (dump command).
Request: I'm seeking advice on alternative parallel methods that can improve efficiency and maintain header information without requiring significant changes to my existing Spark SQL or DataFrame structure.
Any suggestions or experiences would be greatly appreciated.
Thanks,