This seems to be a corner case which no one has reported. If you can send a link to your notebook and verbally authorize me to view/run it, I can take a look at the issue.
Having a large # of small files or folders can significantly deteriorate the performance of loading the data. The best way is to keep the folders/files merged so that each file is around 64MB size. There are different ways to achieve this: your writ...
You can set the following spark sql property spark.sql.parquet.compression.codec.
In sql:
%sql set spark.sql.parquet.compression.codec=snappy
You can also set in the sqlContext directly:
sqlContext.setConf("spark.sql.parquet.compression.codec.", "sn...