cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Michael_Galli
by Contributor II
  • 2402 Views
  • 1 replies
  • 1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

  • 2402 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

  • 1 kudos
Ryan_Chynoweth
by Honored Contributor III
  • 1682 Views
  • 2 replies
  • 1 kudos
  • 1682 Views
  • 2 replies
  • 1 kudos
Latest Reply
User16783853906
Contributor III
  • 1 kudos

Delta cache accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote location. Successive reads of the sa...

  • 1 kudos
1 More Replies
Labels