topic Re: How to stop a Streaming Job based on time of the week in Data Engineering

How to stop a Streaming Job based on time of the week

nolanlavender00 — Fri, 20 Aug 2021 20:51:20 GMT

I have an always-on job cluster triggering Spark Streaming jobs. I would like to stop this streaming job once a week to run table maintenance. I was looking to leverage the foreachBatch function to check a condition and stop the job accordingly.

Re: How to stop a Streaming Job based on time of the week

mathan_pillai — Mon, 20 Sep 2021 16:44:42 GMT

Hi @Nolan Lavender , For e.g. if want to stop streaming on Saturday, you could do something like the below. Below is just a pseudo code.

.foreachBatch{ (batchDF: DataFrame, batchId: Long) =>

if (date_format(current_timestamp(), "u") == 6) { //run commands to maintain the table }

Alternatively, You can calculate approximately how many micro batches are processed in a week and then you can periodically stop the streaming job. If your streaming is processing 100 microbatches in a week, then you can do something like below.

.foreachBatch{ (batchDF: DataFrame, batchId: Long) =>

if (batchId % 101 == 0) { //run commands to maintain the table }

Re: How to stop a Streaming Job based on time of the week

mroy — Fri, 28 Jun 2024 03:27:46 GMT

You could also use the "Available-now micro-batch" trigger. It only processes one batch at a time, and you can do whatever you want in between batches (sleep, shut down, vacuum, etc.)