cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Creating a Databricks Asset Bundle with Sequential Pipelines and Workflow using YAML

smit_tw
New Contributor III

Is it possible to create a repository with a Databricks asset bundle that includes the following pipelines?

  1. Test1 (Delta Live Table Pipeline)
  2. Test2 (Delta Live Table Pipeline)
  3. Test3 (Delta Live Table Pipeline)
  4. Workflow Job
  5. Workflow to execute the above pipelines in sequence (4 โ†’ 1 โ†’ 2 โ†’ 3).

Can you create 5 YAML files that accomplish the following:

  • Define and set up the pipelines.
  • Configure the workflow to run them in the specified sequence (4 โ†’ 1 โ†’ 2 โ†’ 3), ensuring each pipeline correctly references the ones it depends on?
1 ACCEPTED SOLUTION

Accepted Solutions

filipniziol
Contributor III

Hi @smit_tw ,

yes, it is possible to do so, here is the sample yml:

resources:
  jobs:
    sample_job:
      name: sample_job
      tasks:
        - task_key: JobTask
          run_job_task:
            job_id: 1094194179990459
        - task_key: DltTask1
          depends_on:
            - task_key: JobTask
          pipeline_task:
            pipeline_id: da5fa00c-33b6-4850-8ea3-53f6e8d4b0e9
            full_refresh: false
        - task_key: DltTask2
          depends_on:
            - task_key: DltTask1
          pipeline_task:
            pipeline_id: da5fa00c-33b6-4850-8ea3-53f6e8d4b0e9
            full_refresh: false
      queue:
        enabled: true

The possible improvement is to use lookup to to reference jobs and pipelines by their name rather than their ids.

Check this discussion:
https://community.databricks.com/t5/data-engineering/getting-job-id-dynamically-to-create-another-jo...

 

View solution in original post

3 REPLIES 3

filipniziol
Contributor III

Hi @smit_tw ,

yes, it is possible to do so, here is the sample yml:

resources:
  jobs:
    sample_job:
      name: sample_job
      tasks:
        - task_key: JobTask
          run_job_task:
            job_id: 1094194179990459
        - task_key: DltTask1
          depends_on:
            - task_key: JobTask
          pipeline_task:
            pipeline_id: da5fa00c-33b6-4850-8ea3-53f6e8d4b0e9
            full_refresh: false
        - task_key: DltTask2
          depends_on:
            - task_key: DltTask1
          pipeline_task:
            pipeline_id: da5fa00c-33b6-4850-8ea3-53f6e8d4b0e9
            full_refresh: false
      queue:
        enabled: true

The possible improvement is to use lookup to to reference jobs and pipelines by their name rather than their ids.

Check this discussion:
https://community.databricks.com/t5/data-engineering/getting-job-id-dynamically-to-create-another-jo...

 

smit_tw
New Contributor III

Hello @filipniziol , Thank you so much that link is what I was looking for. 

filipniziol
Contributor III

Hi @smit_tw,

Great! If this resolves your question, please consider marking it as the solution. It helps others in the community find answers more easily. ๐Ÿ˜Š

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group