โ09-01-2025 08:30 AM
I want to schedule the databricks jobs based on the custom calender, like skip the job run on random days or holidays.
#databricks @DataBricks @DATA
โ09-01-2025 08:48 AM - edited โ09-01-2025 08:50 AM
Hi @DataDev ,
There's no out of the box way in Databricsk to do this. You can create pretty complex schedules using the Quartz Cron Syntax, but for your scenario you would need to implment custom logic.
For example, you can create Calendar-Driven Scheduling table that will store information about jobs that should run at particular date.
Then, at first step of your workflow the notebook will check against that table and it will execute job in programatic way based on entries.
โ09-01-2025 08:49 AM
You can schedule it by using "cron" expressions in order to get the most close to your needs/use case. Try to use some tool like this https://www.freeformatter.com/cron-expression-generator-quartz.html to generate it if you are not familiar with them.
โ09-01-2025 09:00 AM
@Coffee77, thanks for sharing that cron generator, before I have typically got help from AI, so I certainly appreciate an actual purpose-built tool
โ09-01-2025 08:54 AM
Hello @DataDev
Nice idea, I haven't thought about this before, but I like the suggestion.
If I had to implement a custom schedule, there are two ways that come to mind.
Firstly, if the schedule is relatively regular, with just an occasional day missed, you could set up a schedule as normal, but then manually pause it on the days you wish to miss.
However this isn't very dynamic. Alternatively, you could :
1) Write a notebook that connects to your calendar.
2) You then want this notebook to simply finish successfully on days you want to run the job, but error on days you don't want the job to run. The exact logic depends on your run criteria/calendar entries.
3) In your pipeline, add this new notebook as your first job, with all other jobs dependant on it.
4) Set the job to run every day (or the minimum time interval you would like between runs)
To make this easier, you could try to export your calendar as a table and point the daily checker towards this instead.
This should achieve the desired result as each day the first notebook will run, check your calendar to confirm if it should proceed or not, then either finish, or error. Assuming you job dependencies are setup correctly, the other tasks will only run on days where the first notebook, successfully finishes.
If you need me to expand further on any of these points feel free to ask. Please let me know how you get on.
Regards - Pilsner
3 weeks ago
Hello @DataDev!
Did the suggestions shared above help address your question? If so, please consider marking one or more responses as the accepted solution. If you found another approach that worked for you, sharing it with the community would be really helpful.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now