Re: Unable to clear cache using a pyspark session

Anonymous · ‎03-08-2023

@Maarten van Raaij : Please try the below and experiment from the options:

Can you please try using the command sparkContext().getOrCreate().getCache().clear() method. This method clears the cache of all RDDs (Resilient Distributed Datasets) and their associated metadata from the in-memory cache
Asides, if the above doesnt work, it means that DataFrame is too large to fit into memory and has spilled to disk. As a solution, increase the amount of memory available to Spark, or optimize your code to reduce the size of the DataFrame
Asides, if the above doesnt work, the DataFrame is being referenced by other DataFrames or objects that have not been unpersisted. As a solution, you will need to unpersist all references to the DataFrame before you can clear its cache