topic Re: How to disable all cache in Data Engineering

How to disable all cache

MikeGo — Sun, 06 Oct 2024 05:54:02 GMT

Hi,

I'm trying to test some SQL perf. I run below first

spark.conf.set('spark.databricks.io.cache.enabled', False)

However, the 2nd run for the same query is still way faster than the first time run. Is there a way to make the query start from a clean beginning without any cache?

Thanks

Re: How to disable all cache

VZLA — Mon, 04 Nov 2024 13:23:21 GMT

Hi @MikeGo ,

It is not clear which cache storage is helping with running your query faster, so the most straightforward way is to reset the sparkContext. Alternatively, these are the three clear cache ways I can think from the top of my head:

// Clear all persistent RDDs from memory, you can verify its effectiveness by monitoring the Storage Tab in the Spark UI spark.sparkContext.getPersistentRDDs.values.foreach(_.unpersist()) // Disable Databricks IO cache as you are currently doing. spark.conf.set("spark.databricks.io.cache.enabled", false) // Clear any cached tables or views if that is what its helping spark.catalog.clearCache()

Re: How to disable all cache

MikeGo — Mon, 18 Nov 2024 21:32:40 GMT

Thanks @VZLA . How to run

spark.sparkContext.getPersistentRDDs.values.foreach(_.unpersist())

from databricks notebook?