We have the situation where many concurrent Azure Datafactory Notebooks are running in one single Databricks Interactive Cluster (Azure E8 Series Driver, 1-10 E4 Series Drivers autoscaling).
Each notebook reads data, does a dataframe.cache(), just to create some counts before / after running a dropDuplicates() for logging as metrics / data quality purposes, like SSIS.
After a few hours, the jobs on the cluster will fail, and the cluster needs a reboot. I think the caching is the reason.
Is it recommended to use spark.catalog.clearCache() at the end of each notebook (does this affect other running jobs on the cluster?), or are there other ideas for better cluster cache cleanup?