Resolved! Pipelines with alot of Spark Caching - best practices for cleanup?
We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).Each notebook reads data, does a dataframe.cache(), just to ...
- 3231 Views
- 1 replies
- 1 kudos
Latest Reply
This cache is dynamically saved to disk if there is no place in memory. So I don't see it as an issue. However, the best practice is to use "unpersist()" method in your code after caching. As in the example below, my answer, the cache/persist method ...
- 1 kudos