cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

To trigger databricks workflow on defined frequency

argl1995dbks
New Contributor III

Hi Databricks, I am trying to run a databricks workflow on scheduled basis (for e.g. the frequency is after every five mins). Here is the databricks.yaml file:

 

bundle:
  name: dab_demo

# include:
#   - resources/*.yml

variables:
  job_cluster_key:
    description: Databricks job cluster
    default: job_cluster

resources:
  jobs:
    dab_demo_job:
      name: dab_demo_job
      schedule:
        quartz_cron_expression: "0 0/5 * * * ?"
        timezone_id: "Asia/Kolkata"
      #  pause_status: "PAUSED"

      email_notifications:
        on_start:
          - arjungoel1995@gmail.com
        on_success:
          - arjungoel1995@gmail.com
        on_failure:
          - arjungoel1995@gmail.com

      tasks:
        - task_key: Sum_Task
          job_cluster_key: ${var.job_cluster_key}
          spark_python_task:
            python_file: src/sum.py

        - task_key: Hello_Task
          job_cluster_key: ${var.job_cluster_key}
          depends_on:
            - task_key: Sum_Task
          spark_python_task:
            python_file: src/hello.py

      job_clusters:
        - job_cluster_key: ${var.job_cluster_key}
          new_cluster:
            spark_version: 13.3.x-scala2.12
            node_type_id: n2-highmem-4
            autoscale:
                min_workers: 2
                max_workers: 8


targets:
  dev:
    mode: development
    default: true
    workspace:
      host:<>
 
But the issue is the workflow is not getting triggered. Can you please tell if there is any issue in the schedule block and also if you can share more information about this option as I was not able to find relevant document for the same.
5 REPLIES 5

szymon_dybczak
Contributor III

Hi @argl1995dbks ,

You are deploying in development mode. In this mode all schedules and trigger are paused.

If you would like unpause schedules and triggers you need to set schedule.pause_set to UNPAUSED.

Refer to documentation

Databricks Asset Bundle deployment modes | Databricks on AWS

 

Hi @szymon_dybczak , thanks for the help, the issue is sorted, however I have a question i.e. we have many workflows which we have to trigger on Monthly and Quarterly frequency, so is there a way where we can have a yaml file specifically for capturing the frequency and then reference it in our databricks.yaml, if YES can you please help me with that as well.

Have a look on example section from documentation. The last example is showing how to modularize yaml files. I think you can try define schedule in separate yaml files and the reuse it by including.

https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/settings#--examples

 

Hi @szymon_dybczak , I have followed this documentation link today and created a new custom variables yaml file and refer it in the databricks.yaml and it got failed at the deployment phase. There is already a discussion going on for the same https://community.databricks.com/t5/machine-learning/passing-parameters-in-databricks-workflows/td-p...

argl1995dbks
New Contributor III

Hi, let me explain you the current scenario, we have databricks workflows which has DS, DE and MLOps tasks. The workflows are meant to be triggered on a specific frequency i.e. Monthly and Quarterly and Quarterly workflow depends on the Monthly workflow. We have maintained a configuration file for it. Here is the example:

 

countries:
    INDIA:
        enabled_components: ds_de
        calendar:
            - [1, 2]
            - [2, 4]
   
    Africa:
        enabled_components: all
        calendar:
            - [3, 4]
            - [5, 6]
   
    NA:
        enabled_components: all
        calendar:
            - [4, 5]
            - [4, 8]
 
Currently we are using databricks scheduler and will soon be planning to switch to Automic for triggering these workflows but I have a couple of questions i.e.
 
1. How to reference this config.yaml file in the databricks.yaml file I provided above?
2. How DAB can help improving the current set up?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group