Databricks Community

Sadam97 · ‎08-05-2025

We have created databricks jobs and each has multiple tasks. Each task is 24/7 running streaming with checkpoint enabled. We want it to be stateful when cancel and run the job but it seems like, when we cancel the job run it kill the parent process at OS level and does not wait for the streamings in each task to stop. We are having data missing issues between our reporting and staging layer. As we have to cancel and rerun reporting job for changes and addition, it seems like this causes data missing. To resolve this data missing issue, we have to recompute whole reporting layer by dropping the checkpoint which is very big bottleneck for us.

Is there a way to handle this issue, by prompting databricks job to wait for termination of streamings ?

Vidhi_Khaitan · ‎08-05-2025

If the “reporting” layer is essentially micro-batching over bounded backlogs, run it with availableNow (or a scheduled batch job) so each run is naturally bounded and exits cleanly on its own, no manual cancel. This greatly reduces chances of partial micro-batches during redeploys.

Sadam97 · ‎08-05-2025

Hi @Vidhi_Khaitan ,

We are running reporting layer streamings 24/7 as serving near real time anayltics and executing merge statement in foreeachBatch. We can not opt the scheduling approach. I was exploring continuous as trigger in databricks job for streamings 24/7. Can it be possible solution for my case ? Will it gracefully stop or pause the streamings ?

Databricks Community

databricks job cancel does not wait for termination of streaming tasks

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples