JargerBiirli
Databricks Partner

I'm facing this exact issue, only with a standard job instead of a DLT pipeline. I can't use serverless or restart the cluster periodically due to things out of my control. Any specific advice on diagnosis and resolving? I don't think it can be checkpoint bloat since cluster restart solves the issue for a time.