cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

databricks bundle deploy fails when job includes dbt task and git_source

stevewb
Visitor

I am trying to deploy a dbt task as part of a databricks job using databricks asset bundles.

However, there seems to be a clash that occurs when specifying a job that includes a dbt task that causes a bizarre failure.

I am using v0.237.0 of the CLI.

Minimal reproducible example:

Start with 

databricks bundle init default-python

Update the myproject.job.yml to include a dbt_task with a git_source. I've added comments to reflect the code I have added. (# NEW CODE STARTS HERE and # NEW CODE ENDS HERE)

 

 

# The main job for my_project.
resources:
  jobs:
    my_project_job:
      name: my_project_job

      trigger:
        # Run this job every day, exactly one day from the last run; see https://docs.databricks.com/api/workspace/jobs/create#trigger
        periodic:
          interval: 1
          unit: DAYS

      email_notifications:
        on_failure:
          - some_email@example.com

      tasks:
        - task_key: notebook_task
          job_cluster_key: job_cluster
          notebook_task:
            notebook_path: ../src/notebook.ipynb
        
        - task_key: main_task
          depends_on:
            - task_key: notebook_task
          
          job_cluster_key: job_cluster
          python_wheel_task:
            package_name: my_project
            entry_point: main
          libraries:
            # By default we just include the .whl file generated for the my_project package.
            # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
            # for more information on how to add other libraries.
            - whl: ../dist/*.whl

        # NEW CODE STARTS HERE
        
        - task_key: "example_dbt_task"
          depends_on: 
            - task_key: "main_task"
          job_cluster_key: "job_cluster"
          libraries:
            - pypi:
                package: "dbt-databricks==1.8.0"
            - pypi:
                package: "dbt-core==1.8.0"
          dbt_task:
            commands:
              - "dbt deps"
              - "dbt build"
            source: GIT
      git_source:
        git_url: "https://github.com/dbt-labs/jaffle-shop-classic"
        git_provider: "gitHub"
        git_branch: "main"


      # NEW CODE ENDS HERE

      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            spark_version: 15.4.x-scala2.12
            node_type_id: Standard_D3_v2
            autoscale:
                min_workers: 1
                max_workers: 4

 

 

 

When running databricks bundle deploy, this now results in an error: 

 

 

Error: no files match pattern: ../dist/*.whl
  at resources.jobs.my_project_job.tasks[1].libraries[0].whl
  in resources/my_project.job.yml:35:15

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

madams
Contributor

Thanks for providing that whole example, it was really easy to fiddle with.  I think I've found your solution.  Update the original two tasks on the job (if you want to keep them) like this:

      tasks:
        - task_key: notebook_task
          job_cluster_key: job_cluster
          notebook_task:
            notebook_path: ${workspace.file_path}/src/notebook.ipynb
            source: WORKSPACE
        
        - task_key: main_task
          depends_on:
            - task_key: notebook_task
          
          job_cluster_key: job_cluster
          python_wheel_task:
            package_name: my_project
            entry_point: main
          libraries:
            # By default we just include the .whl file generated for the my_project package.
            # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
            # for more information on how to add other libraries.
            - whl: ${workspace.file_path}/dist/*.whl

Instead of using the `../` in the path, I used the variable `${workspace.file_path}` which references the deployed path.  I also added `source: WORKSPACE` to your notebook task so that it didn't default to GIT.

View solution in original post

2 REPLIES 2

madams
Contributor

Thanks for providing that whole example, it was really easy to fiddle with.  I think I've found your solution.  Update the original two tasks on the job (if you want to keep them) like this:

      tasks:
        - task_key: notebook_task
          job_cluster_key: job_cluster
          notebook_task:
            notebook_path: ${workspace.file_path}/src/notebook.ipynb
            source: WORKSPACE
        
        - task_key: main_task
          depends_on:
            - task_key: notebook_task
          
          job_cluster_key: job_cluster
          python_wheel_task:
            package_name: my_project
            entry_point: main
          libraries:
            # By default we just include the .whl file generated for the my_project package.
            # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
            # for more information on how to add other libraries.
            - whl: ${workspace.file_path}/dist/*.whl

Instead of using the `../` in the path, I used the variable `${workspace.file_path}` which references the deployed path.  I also added `source: WORKSPACE` to your notebook task so that it didn't default to GIT.

Thank you that worked! I spent several hours trying to work out what was going wrong there ๐Ÿ˜…

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group