cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to schedule workflow in python script

bricks3
New Contributor III

I saw how to schedule a workflow using UI but python script, can someone help me to find how to schedule workflow hourly in python script ? Thank you.

3 ACCEPTED SOLUTIONS

Accepted Solutions

ashraf1395
Honored Contributor

You can use databricks sdk or databricks rest api to achieve this

Databricks sdk - in the backend uses API only but it is more secure. I will share you the links to both , you can choose according to your usecase

Databricks api
- If the job is already created you want to update it to add schedule  : https://docs.databricks.com/api/workspace/jobs/update#new_settings-schedule

- if you want to create a complete new job : https://docs.databricks.com/api/workspace/jobs/create#schedule


databricks sdk
- If the job is already created you want to update it to add schedule : You can using list function  and get your workflow and then or you can directly use the update function and send the job_params in there.

- if you want to create a new job : you can use the create function
Using SDK method is a little bit complex bcz you will need to find the right set of attributes and functions to use but worth trying you can even send the link to llm and ask it help. 
https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html




View solution in original post

srinum89
New Contributor II

You can update your dab file (databricks.yaml) with corn syntax as below under jobs.

resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
schedule:
quartz_cron_expression: "0 0 * * * ?"
timezone_id: "UTC"
pause_status: UNPAUSED

 

Hope this helps.

 

 

View solution in original post

Isi
Contributor

Hey @bricks3,

Exactly, as far as I know you define the workflow configuration in the YAML file, and under the hood, DABS handles the API calls to Databricks (including scheduling).

To run your workflow hourly, you just need to include the schedule block inside your DABS YAML definition like this:

workflows:
  my_workflow:
    name: "My Hourly Job"
    tasks:
      - task_key: "main_task"
        notebook_task:
          notebook_path: "/Workspace/Path/To/Notebook"
        job_cluster_key: "cluster"
    schedule:
      quartz_cron_expression: "0 0 * * * ?"
      timezone_id: "UTC"
      pause_status: "UNPAUSED"

Thanks should be all 🙂

Isi

View solution in original post

5 REPLIES 5

Isi
Contributor

Hey @bricks3 

If you’re looking to schedule a workflow to run hourly using Python, here’s some clarification and guidance:
To create and schedule a new workflow programmatically, you should use the API.

  • If you want to create a new job and include the hourly schedule, use this:

    POST/api/2.2/jobs/create

    This lets you define the job and its scheduling in one go.

  • If the job already exists and you simply want to add or modify the schedule, use this:

    POST /api/2.2/jobs/update

     
    This endpoint allows you to update an existing job

The scheduling configuration uses Quartz cron expressions. For an hourly schedule, you can use:

    "schedule": {
      "quartz_cron_expression": "20 30 * * * ?",
      "timezone_id": "Europe/London",
      "pause_status": "UNPAUSED"
    }


If you’re using the Databricks UI:

Go to Workflows, and then in the right panel click “Schedule and Workflows”. There you can select the Schedule interval and configure it to run hourly, daily, etc., using the graphical interface.

Hope this helps, 🙂

Isi

ashraf1395
Honored Contributor

You can use databricks sdk or databricks rest api to achieve this

Databricks sdk - in the backend uses API only but it is more secure. I will share you the links to both , you can choose according to your usecase

Databricks api
- If the job is already created you want to update it to add schedule  : https://docs.databricks.com/api/workspace/jobs/update#new_settings-schedule

- if you want to create a complete new job : https://docs.databricks.com/api/workspace/jobs/create#schedule


databricks sdk
- If the job is already created you want to update it to add schedule : You can using list function  and get your workflow and then or you can directly use the update function and send the job_params in there.

- if you want to create a new job : you can use the create function
Using SDK method is a little bit complex bcz you will need to find the right set of attributes and functions to use but worth trying you can even send the link to llm and ask it help. 
https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html




bricks3
New Contributor III

@Isi @ashraf1395 Thank you for your reply, I am using dabs, how to use this configuration in dabs ? I can not edit the workflow in webui, I want to use this configuration in dabs yaml files, I think dabs uses terraform and terraform calls this api if I am right.

srinum89
New Contributor II

You can update your dab file (databricks.yaml) with corn syntax as below under jobs.

resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
schedule:
quartz_cron_expression: "0 0 * * * ?"
timezone_id: "UTC"
pause_status: UNPAUSED

 

Hope this helps.

 

 

Isi
Contributor

Hey @bricks3,

Exactly, as far as I know you define the workflow configuration in the YAML file, and under the hood, DABS handles the API calls to Databricks (including scheduling).

To run your workflow hourly, you just need to include the schedule block inside your DABS YAML definition like this:

workflows:
  my_workflow:
    name: "My Hourly Job"
    tasks:
      - task_key: "main_task"
        notebook_task:
          notebook_path: "/Workspace/Path/To/Notebook"
        job_cluster_key: "cluster"
    schedule:
      quartz_cron_expression: "0 0 * * * ?"
      timezone_id: "UTC"
      pause_status: "UNPAUSED"

Thanks should be all 🙂

Isi

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now