Hi @Matt Fury
Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached.
However, you can check whether the table is cached or not using .storageLevel method.
E.g. I have a table named table1. Before caching, if I run the below,
spark.table("table1").storageLevel -- Output will be StorageLevel(False, False, False, False, 1)
cache table table1; -- Now I'm caching the table
spark.table("table1").storageLevel -- Output will be StorageLevel(True, True, False, True, 1)
you can get individual flags using the respective storagelevel like
spark.table("table1").storageLevel.useMemory
spark.table("table1").storageLevel.useDisk
spark.table("table1").storageLevel.useOffHeap etc...
For more on storage levels, check out https://sparkbyexamples.com/spark/spark-persistence-storage-levels/
Cheers..
Uma Mahesh D