cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I want to run a streaming job from morning 6a.m to evening 5p.m how can I schedule this window in databricks. Or how can u stop my stream at 5pm?

Bhawna_bedi
New Contributor II
 
1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

You can use the CLI https://docs.databricks.com/dev-tools/cli/index.html to schedule a job and you can also schedule a cluster to terminate through API calls. There are also integrations with tools such as airflow and azure data factory.

View solution in original post

5 REPLIES 5

Anonymous
Not applicable

You can use the CLI https://docs.databricks.com/dev-tools/cli/index.html to schedule a job and you can also schedule a cluster to terminate through API calls. There are also integrations with tools such as airflow and azure data factory.

merca
Valued Contributor II

You could set up schedule to start 6am and timeout seconds to 39 600 that is 11 hours. With max retries to 1. There is a downside to it - if your stream fails in the middle of day - it will run for 11 hours regardless when it stops.

Sandeep
Contributor III

If you are looking for a graceful stop (Not to stop exactly at 5 but stop after the micro-batch that was in progress at 5 o clock instead of abruptly stopping the stream), you can try the following. The downside is if the micro batch duration is high, stream stop will be delayed.

import java.time.LocalTime
val queryStopListner = new StreamingQueryListener() {
    override def onQueryStarted(queryStarted: StreamingQueryListener.QueryStartedEvent): Unit = {
      
    }
    override def onQueryTerminated(queryTerminated: StreamingQueryListener.QueryTerminatedEvent): Unit = {
      
    }
    override def onQueryProgress(queryProgress: StreamingQueryListener.QueryProgressEvent): Unit = {
      
      val id = queryProgress.progress.id
      if(LocalTime.now().isAfter(LocalTime.parse("17:00:00"))){
        val currentStreamingQuery = spark.streams.get(id)  
        currentStreamingQuery.stop
      }
    }
}
 
//Add this query listner to the session
spark.streams.addListener(queryStopListner)

DimaP
New Contributor II

Might anybody knows what will happen if I set task timeout in Workflows for the Streaming job?

merca
Valued Contributor II

If you are streaming to delta, not much, the micro batch will fail and in next time the stream will pick up from last successful write (due to ACID). I don't know about other formats, what happens if the stream is aborted in mid micro batch.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group