cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Schedule databricks job based on custom calendar

DataDev
New Contributor

I want to schedule the databricks jobs based on the custom calender, like skip the job run on random days or holidays.

#databricks @DataBricks @DATA 

4 REPLIES 4

szymon_dybczak
Esteemed Contributor III

Hi @DataDev ,

There's no out of the box way in Databricsk to do this. You can create pretty complex schedules using the Quartz Cron Syntax, but for your scenario you would need to implment custom logic.

For example, you can create Calendar-Driven Scheduling table that will store information about jobs that should run at particular date.
Then, at first step of your workflow the notebook will check against that table and it will execute job in programatic way based on entries.

Cron Trigger Tutorial

Coffee77
New Contributor II

You can schedule it by using "cron" expressions in order to get the most close to your needs/use case. Try to use some tool like this https://www.freeformatter.com/cron-expression-generator-quartz.html to generate it if you are not familiar with them.

https://www.youtube.com/@CafeConData

@Coffee77, thanks for sharing that cron generator, before I have typically got help from AI, so I certainly appreciate an actual purpose-built tool

Pilsner
Contributor

Hello @DataDev 

Nice idea, I haven't thought about this before, but I like the suggestion.

If I had to implement a custom schedule, there are two ways that come to mind.

Firstly, if the schedule is relatively regular, with just an occasional day missed, you could set up a schedule as normal, but then manually pause it on the days you wish to miss. 

Pilsner_0-1756741448860.png



However this isn't very dynamic. Alternatively, you could :

1) Write a notebook that connects to your calendar.
2) You then want this notebook to simply finish successfully on days you want to run the job, but error on days you don't want the job to run. The exact logic depends on your run criteria/calendar entries.
3) In your pipeline, add this new notebook as your first job, with all other jobs dependant on it. 
4) Set the job to run every day (or the minimum time interval you would like between runs)

To make this easier, you could try to export your calendar as a table and point the daily checker towards this instead.

This should achieve the desired result as each day the first notebook will run, check your calendar to confirm if it should proceed or not, then either finish, or error. Assuming you job dependencies are setup correctly, the other tasks will only run on days where the first notebook, successfully finishes. 

If you need me to expand further on any of these points feel free to ask. Please let me know how you get on.  

Regards - Pilsner

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now