Efficient Parallel JSON Extraction from Elasticsearch in Azure Databricks

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hi Databricks community,

I'm facing a challenge extracting JSON data from Elasticsearch in Azure Databricks efficiently, maintaining header information.

Previously, I had to use RDDs for parallel extraction, but they're no longer supported in Databricks. This forced me to switch to serialized extraction (dump command).

Request: I'm seeking advice on alternative parallel methods that can improve efficiency and maintain header information without requiring significant changes to my existing Spark SQL or DataFrame structure.

Any suggestions or experiences would be greatly appreciated.

Thanks,

0 REPLIES 0

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.