- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-05-2024 10:54 PM
Hi,
I'm trying to test some SQL perf. I run below first
spark.conf.set('spark.databricks.io.cache.enabled', False)
However, the 2nd run for the same query is still way faster than the first time run. Is there a way to make the query start from a clean beginning without any cache?
Thanks
- Labels:
-
Delta Lake
-
Spark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2024 05:22 AM - edited 11-04-2024 05:23 AM
Hi @Brad ,
It is not clear which cache storage is helping with running your query faster, so the most straightforward way is to reset the sparkContext. Alternatively, these are the three clear cache ways I can think from the top of my head:
// Clear all persistent RDDs from memory, you can verify its effectiveness by monitoring the Storage Tab in the Spark UI
spark.sparkContext.getPersistentRDDs.values.foreach(_.unpersist())
// Disable Databricks IO cache as you are currently doing.
spark.conf.set("spark.databricks.io.cache.enabled", false)
// Clear any cached tables or views if that is what its helping
spark.catalog.clearCache()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2024 05:22 AM - edited 11-04-2024 05:23 AM
Hi @Brad ,
It is not clear which cache storage is helping with running your query faster, so the most straightforward way is to reset the sparkContext. Alternatively, these are the three clear cache ways I can think from the top of my head:
// Clear all persistent RDDs from memory, you can verify its effectiveness by monitoring the Storage Tab in the Spark UI
spark.sparkContext.getPersistentRDDs.values.foreach(_.unpersist())
// Disable Databricks IO cache as you are currently doing.
spark.conf.set("spark.databricks.io.cache.enabled", false)
// Clear any cached tables or views if that is what its helping
spark.catalog.clearCache()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-18-2024 01:32 PM
Thanks @VZLA . How to run
spark.sparkContext.getPersistentRDDs.values.foreach(_.unpersist())
from databricks notebook?

