Data Engineering

by Michael_Galli • Contributor III

04-22-2022 3:00:10 AM

4892 Views
1 replies
1 kudos

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...

Data Engineering

4892 Views
1 replies
1 kudos

04-22-2022 3:00:10 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

04-22-2022 3:16:05 AM

1 kudos

This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...

1 kudos

04-22-2022 3:16:05 AM

by Ryan_Chynoweth • Esteemed Contributor

06-04-2021 11:42:45 AM

5249 Views
2 replies
1 kudos

What advantage is there to Databricks caching and Spark caching?

Data Engineering

5249 Views
2 replies
1 kudos

06-04-2021 11:42:45 AM

View Replies

Latest Reply

User16783853906
Contributor III

06-23-2021 6:37:33 PM

1 kudos

Delta cache accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. The data is cached automatically whenever a file has to be fetched from a remote location. Successive reads of the sa...

1 kudos

06-23-2021 6:37:33 PM

1 More Replies

Databricks Community

Forum Posts

Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?

What advantage is there to Databricks caching and Spark caching?