cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

unpersist doesn't clear

anandreddy23
New Contributor II

from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
from pyspark.storagelevel import StorageLevel
spark = SparkSession.builder.appName('TEST').config('spark.ui.port','4098').enableHiveSupport().getOrCreate()

df4 = spark.sql(' \
select * from hive_schema.table_name limit 1')
print("query completed " )
df4.unpersist()
df4.count()

df4.show()

I have execute above code to  clear the dataframe release the memory. However, df4.show() still works and shows the data. Could you please help me with right method to free memory occupied by a spark DF please ?

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @anandreddy23 , Certainly! When working with Spark DataFrames, it’s essential to manage memory efficiently.

 

Let’s explore the options to free up memory occupied by a DataFrame:

 

df.unpersist(): This method marks the DataFrame for removal from cache, but it doesn’t necessarily free up the memory immediately. It merely schedules the removal. You can use the blocking parameter to ensure it blocks execution until the DataFrame is uncached. For example:

  1. df4.unpersist(blocking=True)

Garbage Collection (GC): Spark automatically manages memory and garbage collection. However, you cannot manually trigger GC within your Spark application. Assigning df = None won’t release much memory because the DataFrame itself doesn’t hold data; it’s a description of computation. If your application faces memory issues, consider tuning the garbage collection settings.

Catalog Clear Cache: In PySpark, you can try using spark.catalog.clearCache() to mark the cache for cleaning. Additionally, invoking res.checkpoint() will remove the lineage. For example:

  1. spark.catalog.clearCache()

Remember that freeing memory depends on various factors, including cache space, regular execution heap space, and overall memory availability. Choose the approach that best fits your specific use case and memory constraints. If you encounter out-of-memory errors, consider adjusting your Spark configuration and GC settings.

anandreddy23
New Contributor II

Thank you so much for taking time and explaining the concepts

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!