Databricks Community

Hritik_Moon · ‎10-13-2025

Hello,

I am using databricks free edition, is there a way to turn off IO caching.

I am trying to learn optimization and cant see any difference in query run time with caching enabled.

szymon_dybczak · ‎10-13-2025

I guess you cannot. To disable disk cache you need to have ability to run following command:

spark.conf.set("spark.databricks.io.cache.enabled", "[true | false]")

But serverless compute does not support setting most Spark properties for notebooks or jobs. The following are the properties you can configure:

So, if you want to have a proper envirionment to learn apache spark optimization use OSS Apache Spark docker container as an alternative

View solution in original post

szymon_dybczak · ‎10-13-2025

Hi @Hritik_Moon ,

I guess you cannot. To disable disk cache you need to have ability to run following command:

spark.conf.set("spark.databricks.io.cache.enabled", "[true | false]")

But serverless compute does not support setting most Spark properties for notebooks or jobs. The following are the properties you can configure:

So, if you want to have a proper envirionment to learn apache spark optimization use OSS Apache Spark docker container as an alternative

Hritik_Moon · ‎10-13-2025

Thanks, I have no prior experience with docker and how to get spark but I guess youtube will help 😁.

szymon_dybczak · ‎10-13-2025

Yep, it's really simple to setup. As an added benefit you will have a full control over your environment 😄 Here you have an yt video that shows how to setup it:

How to Run a Spark Cluster with Multiple Workers Locally Using Docker

Hritik_Moon · ‎10-13-2025

Thanks, I will be back later with additional questions 😊.

szymon_dybczak · ‎10-13-2025

Sure, one suggestion though. If your next question will be related to cache then ask it here. But if it will be something completely unrelated to this topic, please start new one.
Usually, all questions and answers should be related to given thread. This way it's much easier for others to find what they're looking for. Also, if someone's answer solved your issue/help you try to pick that answer as a solution for a given thread.

Prajapathy_NKR · ‎10-16-2025

@Hritik_Moon

1. check if your data is cached, this you can see in sparkUI > storage tab.

2. if it is not cached, try to add a action statement after you cache. eg : df.count(). Data is cached with the first action statement it encounters. Now check in spark UI.

3. if you have only one action statement, you dont see any difference. But if you have multiple action statement, you tend to see the relevant transformations before your cached dataframe gets skipped. You can see these skips in your DAG.

Databricks Community

Stop Cache in free edition

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples