How we solved the "18-Hour Running Job" problem with Data-Driven Timeouts
Hi everyone,I recently dealt with a frustrating scenario: a Databricks job that usually takes minutes ran for 18 hours without failing, quietly consuming compute and blocking downstream pipelines.The driver hadn't crashed, and the job hadn't failed—i...
- 3 Views
- 0 replies
- 0 kudos