โ10-13-2025 04:51 AM
Hello,
I am using databricks free edition, is there a way to turn off IO caching.
I am trying to learn optimization and cant see any difference in query run time with caching enabled.
โ10-13-2025 05:20 AM
Hi @Hritik_Moon ,
I guess you cannot. To disable disk cache you need to have ability to run following command:
spark.conf.set("spark.databricks.io.cache.enabled", "[true | false]")But serverless compute does not support setting most Spark properties for notebooks or jobs. The following are the properties you can configure:
So, if you want to have a proper envirionment to learn apache spark optimization use OSS Apache Spark docker container as an alternative
โ10-13-2025 05:20 AM
Hi @Hritik_Moon ,
I guess you cannot. To disable disk cache you need to have ability to run following command:
spark.conf.set("spark.databricks.io.cache.enabled", "[true | false]")But serverless compute does not support setting most Spark properties for notebooks or jobs. The following are the properties you can configure:
So, if you want to have a proper envirionment to learn apache spark optimization use OSS Apache Spark docker container as an alternative
โ10-13-2025 05:26 AM
Thanks, I have no prior experience with docker and how to get spark but I guess youtube will help ๐.
โ10-13-2025 05:29 AM
Yep, it's really simple to setup. As an added benefit you will have a full control over your environment ๐ Here you have an yt video that shows how to setup it:
How to Run a Spark Cluster with Multiple Workers Locally Using Docker
โ10-13-2025 05:31 AM
Thanks, I will be back later with additional questions ๐.
โ10-13-2025 05:36 AM - edited โ10-13-2025 05:37 AM
Sure, one suggestion though. If your next question will be related to cache then ask it here. But if it will be something completely unrelated to this topic, please start new one.
Usually, all questions and answers should be related to given thread. This way it's much easier for others to find what they're looking for. Also, if someone's answer solved your issue/help you try to pick that answer as a solution for a given thread.
โ10-16-2025 09:14 PM
1. check if your data is cached, this you can see in sparkUI > storage tab.
2. if it is not cached, try to add a action statement after you cache. eg : df.count(). Data is cached with the first action statement it encounters. Now check in spark UI.
3. if you have only one action statement, you dont see any difference. But if you have multiple action statement, you tend to see the relevant transformations before your cached dataframe gets skipped. You can see these skips in your DAG.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now