I have a small (under 20 tables, all streaming) DLT pipeline running in triggered mode, scheduled every 15min during the workday. For development I've set `pipelines.clusterShutdown.delay` to avoid having to start a cluster every update.
I've noticed that the updates' runtimes are progressively worse as the time goes on, ultimately doubling in time after only 2h. It increases progressively even after updates on where there are no updates to any of the tables; each table's update duration is individually low but the overall runtime is high. Eventually we have to let the compute shut down to restart and regain performance.
Cluster metrics show nothing out of ordinary; even though free memory slowly decreases over time there's still enough, and CPU load is way below its limit even at its peak. There's nothing obviously wrong in the logs either.
I'm assuming restarting the cluster periodically is expected somehow, but what if it were a continuous pipeline instead where it would stay up until manually shut down, wouldn't this issue be more prominent?
Is there a way to mitigate this without restarting the cluster several times a day?