As promised @BS_THE_ANALYST , in this new video and summarized in post, I try to explain what Spark Caching and Databricks Disk Caching are and how Caching strategy can be leveraged by making these cool features work together:
Spark Caching vs Databricks Disk Caching

Spark Caching (Memory/Disk via cache() or persist())
Scope: Spark application / job level
How it works: When you call .cache() or .persist() on a DataFrame/RDD, Spark materializes that dataset after the first action and keeps it in executor memory (RAM). If memory is insufficient and .persist() used it can optionally spill to disk depending on the storage level (MEMORY_ONLY, MEMORY_AND_DISK, etc.).
Where it lives: Inside the Spark executor JVM heap, and optionally on local disk.
Persistence: Data disappears when the Spark application ends, or if it is evicted due to memory pressure.
Best for: Reusing intermediate results across multiple actions in the same job, Iterative algorithms (ML, graph processing, etc.)
Databricks Disk Caching (Before known as Delta Cache)
Scope: Cluster level
How it works: This is a transparent IO-level cache built into Databricks Runtime that stores data from cloud object storage (S3, ADLS, GCS) onto the local NVMe SSDs of the cluster nodes. Itโs at the file block level, not tied to a Spark job.
Databricks disk caching can only be enabled on clusters that have local SSD storage โ ๏ธ
Where it lives: Outside of the JVM, on local SSDs of the Databricks cluster and managed automatically by Databricks Runtime.
Persistence: Survives across Spark jobs running on the same cluster, cleared when the cluster is terminated or when local SSD storage is needed for something else.
Best for: Repeated reads of the same files from cloud storage across different jobs or notebooks, improving read performance from Delta tables and Parquet files
Trigger: No code change, automatic on DBR >= 10.4, enabled via spark.databricks.io.cache.enabled true
Why Together = Best Performance
Disk caching = reduces cloud I/O latency (cluster-wide).
Spark caching = reduces recomputation overhead (job-specific).

Using both ensures:
https://www.youtube.com/@CafeConData