-werners-
Esteemed Contributor III

the best way is indeed to write the extracted data and then read it back into spark.  Like that you do not burden spark with all the api calls.