โ04-03-2025 06:06 AM
I saw how to schedule a workflow using UI but python script, can someone help me to find how to schedule workflow hourly in python script ? Thank you.
โ04-03-2025 07:05 AM
You can use databricks sdk or databricks rest api to achieve this
Databricks sdk - in the backend uses API only but it is more secure. I will share you the links to both , you can choose according to your usecase
Databricks api
- If the job is already created you want to update it to add schedule : https://docs.databricks.com/api/workspace/jobs/update#new_settings-schedule
- if you want to create a complete new job : https://docs.databricks.com/api/workspace/jobs/create#schedule
databricks sdk
- If the job is already created you want to update it to add schedule : You can using list function and get your workflow and then or you can directly use the update function and send the job_params in there.
- if you want to create a new job : you can use the create function
Using SDK method is a little bit complex bcz you will need to find the right set of attributes and functions to use but worth trying you can even send the link to llm and ask it help.
https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html
โ04-03-2025 08:45 AM - edited โ04-03-2025 08:45 AM
You can update your dab file (databricks.yaml) with corn syntax as below under jobs.
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
schedule:
quartz_cron_expression: "0 0 * * * ?"
timezone_id: "UTC"
pause_status: UNPAUSED
Hope this helps.
โ04-03-2025 08:47 AM
Hey @bricks3,
Exactly, as far as I know you define the workflow configuration in the YAML file, and under the hood, DABS handles the API calls to Databricks (including scheduling).
To run your workflow hourly, you just need to include the schedule block inside your DABS YAML definition like this:
workflows:
my_workflow:
name: "My Hourly Job"
tasks:
- task_key: "main_task"
notebook_task:
notebook_path: "/Workspace/Path/To/Notebook"
job_cluster_key: "cluster"
schedule:
quartz_cron_expression: "0 0 * * * ?"
timezone_id: "UTC"
pause_status: "UNPAUSED"
Thanks should be all ๐
Isi
โ04-03-2025 06:57 AM
Hey @bricks3
If youโre looking to schedule a workflow to run hourly using Python, hereโs some clarification and guidance:
To create and schedule a new workflow programmatically, you should use the API.
If you want to create a new job and include the hourly schedule, use this:
POST/api/2.2/jobs/create
This lets you define the job and its scheduling in one go.
If the job already exists and you simply want to add or modify the schedule, use this:
POST /api/2.2/jobs/update
The scheduling configuration uses Quartz cron expressions. For an hourly schedule, you can use:
"schedule": { "quartz_cron_expression": "20 30 * * * ?", "timezone_id": "Europe/London", "pause_status": "UNPAUSED" }
If youโre using the Databricks UI:
Go to Workflows, and then in the right panel click โSchedule and Workflowsโ. There you can select the Schedule interval and configure it to run hourly, daily, etc., using the graphical interface.
Hope this helps, ๐
Isi
โ04-03-2025 07:05 AM
You can use databricks sdk or databricks rest api to achieve this
Databricks sdk - in the backend uses API only but it is more secure. I will share you the links to both , you can choose according to your usecase
Databricks api
- If the job is already created you want to update it to add schedule : https://docs.databricks.com/api/workspace/jobs/update#new_settings-schedule
- if you want to create a complete new job : https://docs.databricks.com/api/workspace/jobs/create#schedule
databricks sdk
- If the job is already created you want to update it to add schedule : You can using list function and get your workflow and then or you can directly use the update function and send the job_params in there.
- if you want to create a new job : you can use the create function
Using SDK method is a little bit complex bcz you will need to find the right set of attributes and functions to use but worth trying you can even send the link to llm and ask it help.
https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html
โ04-03-2025 08:11 AM
@Isi @ashraf1395 Thank you for your reply, I am using dabs, how to use this configuration in dabs ? I can not edit the workflow in webui, I want to use this configuration in dabs yaml files, I think dabs uses terraform and terraform calls this api if I am right.
โ04-03-2025 08:45 AM - edited โ04-03-2025 08:45 AM
You can update your dab file (databricks.yaml) with corn syntax as below under jobs.
resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
schedule:
quartz_cron_expression: "0 0 * * * ?"
timezone_id: "UTC"
pause_status: UNPAUSED
Hope this helps.
โ04-03-2025 08:47 AM
Hey @bricks3,
Exactly, as far as I know you define the workflow configuration in the YAML file, and under the hood, DABS handles the API calls to Databricks (including scheduling).
To run your workflow hourly, you just need to include the schedule block inside your DABS YAML definition like this:
workflows:
my_workflow:
name: "My Hourly Job"
tasks:
- task_key: "main_task"
notebook_task:
notebook_path: "/Workspace/Path/To/Notebook"
job_cluster_key: "cluster"
schedule:
quartz_cron_expression: "0 0 * * * ?"
timezone_id: "UTC"
pause_status: "UNPAUSED"
Thanks should be all ๐
Isi
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now