UmaMahesh1
Honored Contributor III

Hi @Matt Fury​ 

Yes...I guess cache overwrites each time you run it because for me it took nearly same amount of time for 1million records to be cached.

However, you can check whether the table is cached or not using .storageLevel method.

E.g. I have a table named table1. Before caching, if I run the below,

spark.table("table1").storageLevel -- Output will be StorageLevel(False, False, False, False, 1)

cache table table1; -- Now I'm caching the table

spark.table("table1").storageLevel -- Output will be StorageLevel(True, True, False, True, 1)

you can get individual flags using the respective storagelevel like

spark.table("table1").storageLevel.useMemory

spark.table("table1").storageLevel.useDisk

spark.table("table1").storageLevel.useOffHeap etc...

For more on storage levels, check out https://sparkbyexamples.com/spark/spark-persistence-storage-levels/

Cheers..

Uma Mahesh D