Databricks Community

bricks3 · 2 weeks ago

I saw how to schedule a workflow using UI but python script, can someone help me to find how to schedule workflow hourly in python script ? Thank you.

ashraf1395 · 2 weeks ago

You can use databricks sdk or databricks rest api to achieve this

Databricks sdk - in the backend uses API only but it is more secure. I will share you the links to both , you can choose according to your usecase

Databricks api
- If the job is already created you want to update it to add schedule : https://docs.databricks.com/api/workspace/jobs/update#new_settings-schedule

- if you want to create a complete new job : https://docs.databricks.com/api/workspace/jobs/create#schedule

databricks sdk
- If the job is already created you want to update it to add schedule : You can using list function and get your workflow and then or you can directly use the update function and send the job_params in there.

- if you want to create a new job : you can use the create function
Using SDK method is a little bit complex bcz you will need to find the right set of attributes and functions to use but worth trying you can even send the link to llm and ask it help.
https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html

View solution in original post

srinum89 · 2 weeks ago

You can update your dab file (databricks.yaml) with corn syntax as below under jobs.

resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
schedule:
quartz_cron_expression: "0 0 * * * ?"
timezone_id: "UTC"
pause_status: UNPAUSED

Hope this helps.

View solution in original post

Isi · 2 weeks ago

Hey @bricks3,

Exactly, as far as I know you define the workflow configuration in the YAML file, and under the hood, DABS handles the API calls to Databricks (including scheduling).

To run your workflow hourly, you just need to include the schedule block inside your DABS YAML definition like this:

workflows:
  my_workflow:
    name: "My Hourly Job"
    tasks:
      - task_key: "main_task"
        notebook_task:
          notebook_path: "/Workspace/Path/To/Notebook"
        job_cluster_key: "cluster"
    schedule:
      quartz_cron_expression: "0 0 * * * ?"
      timezone_id: "UTC"
      pause_status: "UNPAUSED"

Thanks should be all 🙂

Isi

View solution in original post

Isi · 2 weeks ago

Hey @bricks3

If you’re looking to schedule a workflow to run hourly using Python, here’s some clarification and guidance:
To create and schedule a new workflow programmatically, you should use the API.

If you want to create a new job and include the hourly schedule, use this:
POST/api/2.2/jobs/create
This lets you define the job and its scheduling in one go.
If the job already exists and you simply want to add or modify the schedule, use this:
POST /api/2.2/jobs/update

This endpoint allows you to update an existing job

The scheduling configuration uses Quartz cron expressions. For an hourly schedule, you can use:

    "schedule": {
      "quartz_cron_expression": "20 30 * * * ?",
      "timezone_id": "Europe/London",
      "pause_status": "UNPAUSED"
    }

If you’re using the Databricks UI:

Go to Workflows, and then in the right panel click “Schedule and Workflows”. There you can select the Schedule interval and configure it to run hourly, daily, etc., using the graphical interface.

Hope this helps, 🙂

Isi

ashraf1395 · 2 weeks ago

You can use databricks sdk or databricks rest api to achieve this

Databricks sdk - in the backend uses API only but it is more secure. I will share you the links to both , you can choose according to your usecase

Databricks api
- If the job is already created you want to update it to add schedule : https://docs.databricks.com/api/workspace/jobs/update#new_settings-schedule

- if you want to create a complete new job : https://docs.databricks.com/api/workspace/jobs/create#schedule

databricks sdk
- If the job is already created you want to update it to add schedule : You can using list function and get your workflow and then or you can directly use the update function and send the job_params in there.

- if you want to create a new job : you can use the create function
Using SDK method is a little bit complex bcz you will need to find the right set of attributes and functions to use but worth trying you can even send the link to llm and ask it help.
https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html

bricks3 · 2 weeks ago

@Isi @ashraf1395 Thank you for your reply, I am using dabs, how to use this configuration in dabs ? I can not edit the workflow in webui, I want to use this configuration in dabs yaml files, I think dabs uses terraform and terraform calls this api if I am right.

srinum89 · 2 weeks ago

You can update your dab file (databricks.yaml) with corn syntax as below under jobs.

resources:
jobs:
hello-job:
name: hello-job
tasks:
- task_key: hello-task
existing_cluster_id: 1234-567890-abcde123
notebook_task:
notebook_path: ./hello.py
schedule:
quartz_cron_expression: "0 0 * * * ?"
timezone_id: "UTC"
pause_status: UNPAUSED

Hope this helps.

Isi · 2 weeks ago

Hey @bricks3,

Exactly, as far as I know you define the workflow configuration in the YAML file, and under the hood, DABS handles the API calls to Databricks (including scheduling).

To run your workflow hourly, you just need to include the schedule block inside your DABS YAML definition like this:

workflows:
  my_workflow:
    name: "My Hourly Job"
    tasks:
      - task_key: "main_task"
        notebook_task:
          notebook_path: "/Workspace/Path/To/Notebook"
        job_cluster_key: "cluster"
    schedule:
      quartz_cron_expression: "0 0 * * * ?"
      timezone_id: "UTC"
      pause_status: "UNPAUSED"

Thanks should be all 🙂

Isi