cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

[Databricks Asset Bundles] Triggering Delta Live Tables

jonhieb
New Contributor III

I would like to know how to schedule a DLT pipeline using DAB's.

I'm trying to trigger a Delta Live Table pipeline using Databricks Asset Bundles. Below is my YAML code:

resources:
  pipelines:
    data_quality_pipelines:
      name: data_quality_pipelines
trigger:
cron:
quartz_cron_schedule: "0 0 10 ? * Mon-Fri"
timezone_id: "America/Sao_Paulo"

      continuous: false          
      catalog: ${bundle.target}
      target: data_quality
      serverless: true

      libraries:
        - notebook:
            path: ../src/customfield_pipeline.ipynb
        - notebook:
            path: ../src/customfieldvalue_pipeline.ipynb
        - notebook:
            path: ../src/customer_pipeline.ipynb
        - notebook:
            path: ../src/team_pipeline.ipynb
        - notebook:
            path: ../src/user_pipeline.ipynb

      configuration:
        env_conf_file: ${var.env_conf_file}
        rules_conf_file: ${var.rules_conf_file}

After I deploy the bundle, the following error appears:

Uploading bundle files to /Workspace/Shared/deploy/.bundle/data_quality_pipelines/prod/files...
Deploying resources...
Updating deployment state...
Deployment complete!

Error: terraform apply: exit status 1

Error: cannot update pipeline: 'trigger' property is not supported yet.

with databricks_pipeline.data_quality_pipelines,
on bundle.tf.json line 61, in resource.databricks_pipeline.data_quality_pipelines:
61: }

I saw in the official documentation (Databricks API: Create Pipeline) that the trigger argument is deprecated. They recommend using the continuous argument, but I cannot configure the schedule with this command.

Does anyone know how to schedule a DLT pipeline using Databricks Asset Bundles? Should I use Databricks Workflows to orchestrate that?

 

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

As of now, Databricks Asset Bundles do not support direct scheduling of DLT pipelines using cron expressions within the bundle configuration. Instead, you can achieve scheduling by creating a Databricks job that triggers the DLT pipeline and then scheduling the job using the Databricks Jobs API or the Databricks UI.

View solution in original post

6 REPLIES 6

Walter_C
Databricks Employee
Databricks Employee

As of now, Databricks Asset Bundles do not support direct scheduling of DLT pipelines using cron expressions within the bundle configuration. Instead, you can achieve scheduling by creating a Databricks job that triggers the DLT pipeline and then scheduling the job using the Databricks Jobs API or the Databricks UI.

jonhieb
New Contributor III

That's worked for me. Thanks!!

cyrillax
New Contributor II

So this is not possible to do so by configuring a job within a bundle using yaml ? only UI or API ?

jonhieb
New Contributor III

No, you can create using Bundles. To do it, you should create a pipeline task inside of a workflow yaml file. After this, inside the yaml file schedule the entire workflow instead of the DLT pipeline directly.

dr-dror
New Contributor II

Can you please provide an example how to do this?

jonhieb
New Contributor III

Of course. In this example, I use the argument pipeline_task to reference a DLT pipeline that I created previously. This allows you to schedule your DLT pipeline inside your workflow.

# Job to orchestrate data_quality_pipelines DLT Pipeline.
resources:
  jobs:
    data_quality_pipelines_job:
      name: schedule_data_quality_job

      schedule:
        quartz_cron_expression: "0 0 8 ? * Mon" # At 8:00:00am, on Monday
        timezone_id: "America/Sao_Paulo"

      timeout_seconds: 3600  # 1 hour

      email_notifications:
        on_failure:
          - ${workspace.current_user.userName}
      webhook_notifications:
        on_failure:
          - id: ${var.webhook_id}

      tasks:
        - task_key: data_quality_task
          pipeline_task: 
            pipeline_id: ${var.input_tables_pipeline_id}
            full_refresh: false

        - task_key: output_data_quality_task
          pipeline_task:
            pipeline_id: ${var.output_tables_pipeline_id}
            full_refresh: false
        
        - task_key: notify_business_areas
          depends_on:
            - task_key: data_quality_task
            - task_key: output_data_quality_task
          run_job_task:
            job_id: ${var.send_notifications_job_id}
          
        - task_key: create_jira_tasks
          depends_on:
            - task_key: notify_business_areas
          run_job_task:
            job_id: ${var.create_jira_tasks_job_id}
          max_retries: 0

      run_as:
        user_name: xxxxxxxx@yyyyyyy
        
      parameters:
        - name: notification_conf_file
          default: ${var.notification_conf_file}

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now