I am having trouble efficiently reading & parsing in a large number of stream files in Pyspark!
Context
Here is the schema of the stream file that I am reading in JSON. Blank spaces are edits for confidentiality purposes.
root
|-- location_info: ar...