We have created databricks jobs and each has multiple tasks. Each task is 24/7 running streaming with checkpoint enabled. We want it to be stateful when cancel and run the job but it seems like, when we cancel the job run it kill the parent process at OS level and does not wait for the streamings in each task to stop. We are having data missing issues between our reporting and staging layer. As we have to cancel and rerun reporting job for changes and addition, it seems like this causes data missing. To resolve this data missing issue, we have to recompute whole reporting layer by dropping the checkpoint which is very big bottleneck for us.
Is there a way to handle this issue, by prompting databricks job to wait for termination of streamings ?