Efficient Parallel JSON Extraction from Elasticsearch in Azure Databricks
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2024 10:26 AM
Hi Databricks community,
I'm facing a challenge extracting JSON data from Elasticsearch in Azure Databricks efficiently, maintaining header information.
Previously, I had to use RDDs for parallel extraction, but they're no longer supported in Databricks. This forced me to switch to serialized extraction (dump command).
Request: I'm seeking advice on alternative parallel methods that can improve efficiency and maintain header information without requiring significant changes to my existing Spark SQL or DataFrame structure.
Any suggestions or experiences would be greatly appreciated.
Thanks,
Labels:
- Labels:
-
Delta Lake
-
Spark
-
Workflows
0 REPLIES 0

