Databricks Community

AdamRink · ‎02-14-2022

We are having issues with checkpoints and schema versions getting out of date (no idea why), but it causes jobs to fail. We have jobs that are running 15-30 streaming queries, so if one fails, that creates an issue. I would like to trap the checkpoint errors, and just reset the checkpoint and log a failure. Not optimal because then I'm reprocessing the stream or at least the window we are looking at.

So the problem I have is it seems the only way to error trap the stream is to use an awaitTermination()... this locks up notebook and the next streams won't start until the first stream is terminated. The awaitAnyTermination() won't catch when the job starts up and an error occurs because the job stops before hitting the awaitAnyTermination?

AdamRink · ‎02-15-2022

Thx Kaniz

AdamRink · ‎02-23-2022

The problem is that on startup if a stream fails, it would never hit the awaitAnyTermination? I almost want to take that while loop and put it on a background thread to start that at the beginning and then fire all the streams afterward... not sure if that is possible?