Databricks Community

Sadam97 · ‎08-05-2025

Hi,

We have created databricks jobs and each has multiple tasks. Each task is 24/7 running streaming with checkpoint enabled. We want it to be stateful when cancel and run the job but it seems like, when we cancel the job run it kill the parent process at OS level and does not wait for the streamings in each task to stop. We are having data missing issues between our reporting and staging layer. As we have to cancel and rerun reporting job for changes and addition, it seems like this causes data missing. To resolve this data missing issue, we have to recompute whole reporting layer by dropping the checkpoint which is very big bottleneck for us.

Is there a way to handle this issue, by prompting databricks job to wait for termination of streamings ?

Sidhant07 · ‎08-26-2025

Hi @Sadam97 ,

This seems to be expected behaviour.
If you are running the jobs in a job cluster:

In job clusters, the Databricks job scheduler treats all streaming queries within a task as belonging to the same job execution context. If any query fails, the overall job is marked as failed and all queries are stopped, aiming to avoid partial or inconsistent updates in automated workflows. For example, if you are having three different streams (e.g., bronze → silver → gold), dependent on each other. If the bronze stream fails with an error, the silver and gold streams won't have any new data to process, and the cluster will be ideal without processing any data, which is not expected.Therefore, this is an expected behaviour with regard to the job cluster as per design.

If you are running the jobs using interactive cluster then here queries are managed by individual notebook sessions, enabling isolated failure and restarts for each query wont affect others.

Recommendation

For Production: If you need true isolation, schedule streaming queries as separate tasks within a job as a workflow, or use distinct clusters. This avoids failures.
For Development: Interactive clusters provide more flexibility for multi-query execution.

Databricks Community

cancel running job kill the parent process and does not wait for streamings to stop

Join Us as a Local Community Builder!

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐