DataFrame.localCheckpoint() and cluster autoscalin...

rcostanza · ‎06-17-2025

I have a notebook where at the beginning I load several dataframes and cache them using localCheckpoint(). I run this notebook using an all-purpose cluster with autoscaling enabled, with a mininum of 1 worker and maximum 2.

The cluster often autoscales from 1 to 2 during the beginning of the execution, before the dataframes are cached, but during some long running tasks halfway through it often downscales from 2 to 1 if there are no other notebooks running in parallel, sometimes within 5min of bringing that worker up. And since an executor was lost, so were all checkpoints done there, which will cause errors later on once I try to use those dataframes.

I understand the alternative is to resort to .checkpoint() instead. But before trying that, is there a way to prevent cluster downscaling during the execution? Or maybe a way to tune it so it won't downscale within X minutes of scaling up?

DataFrame.localCheckpoint() and cluster autoscaling at odds with each other