Anonymous
Not applicable

If you have 5GB of data, you don't need spark. Just use your laptop. Spark is for scale and won't out perform well on small data sets because of all the overhead distributed requires.

Also, don't name a pandas dataframe df_spark_. Just name it something_pdf.