The spark connector is super slow. I found loading json into Azure cosmos dB then writing queries to get sections of data out was 25x times faster because cosmos dB indexes the json. You can stream read data from cosmosdb. You can find python code sn...