databricks bundle deploy fails when job includes dbt task and git_source

stevewb
New Contributor III

I am trying to deploy a dbt task as part of a databricks job using databricks asset bundles.

However, there seems to be a clash that occurs when specifying a job that includes a dbt task that causes a bizarre failure.

I am using v0.237.0 of the CLI.

Minimal reproducible example:

Start with 

databricks bundle init default-python

Update the myproject.job.yml to include a dbt_task with a git_source. I've added comments to reflect the code I have added. (# NEW CODE STARTS HERE and # NEW CODE ENDS HERE)

 

 

# The main job for my_project.
resources:
  jobs:
    my_project_job:
      name: my_project_job

      trigger:
        # Run this job every day, exactly one day from the last run; see https://docs.databricks.com/api/workspace/jobs/create#trigger
        periodic:
          interval: 1
          unit: DAYS

      email_notifications:
        on_failure:
          - some_email@example.com

      tasks:
        - task_key: notebook_task
          job_cluster_key: job_cluster
          notebook_task:
            notebook_path: ../src/notebook.ipynb
        
        - task_key: main_task
          depends_on:
            - task_key: notebook_task
          
          job_cluster_key: job_cluster
          python_wheel_task:
            package_name: my_project
            entry_point: main
          libraries:
            # By default we just include the .whl file generated for the my_project package.
            # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
            # for more information on how to add other libraries.
            - whl: ../dist/*.whl

        # NEW CODE STARTS HERE
        
        - task_key: "example_dbt_task"
          depends_on: 
            - task_key: "main_task"
          job_cluster_key: "job_cluster"
          libraries:
            - pypi:
                package: "dbt-databricks==1.8.0"
            - pypi:
                package: "dbt-core==1.8.0"
          dbt_task:
            commands:
              - "dbt deps"
              - "dbt build"
            source: GIT
      git_source:
        git_url: "https://github.com/dbt-labs/jaffle-shop-classic"
        git_provider: "gitHub"
        git_branch: "main"


      # NEW CODE ENDS HERE

      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            spark_version: 15.4.x-scala2.12
            node_type_id: Standard_D3_v2
            autoscale:
                min_workers: 1
                max_workers: 4

 

 

 

When running databricks bundle deploy, this now results in an error: 

 

 

Error: no files match pattern: ../dist/*.whl
  at resources.jobs.my_project_job.tasks[1].libraries[0].whl
  in resources/my_project.job.yml:35:15

 

 

 

 

madams
Contributor III

Thanks for providing that whole example, it was really easy to fiddle with.  I think I've found your solution.  Update the original two tasks on the job (if you want to keep them) like this:

      tasks:
        - task_key: notebook_task
          job_cluster_key: job_cluster
          notebook_task:
            notebook_path: ${workspace.file_path}/src/notebook.ipynb
            source: WORKSPACE
        
        - task_key: main_task
          depends_on:
            - task_key: notebook_task
          
          job_cluster_key: job_cluster
          python_wheel_task:
            package_name: my_project
            entry_point: main
          libraries:
            # By default we just include the .whl file generated for the my_project package.
            # See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
            # for more information on how to add other libraries.
            - whl: ${workspace.file_path}/dist/*.whl

Instead of using the `../` in the path, I used the variable `${workspace.file_path}` which references the deployed path.  I also added `source: WORKSPACE` to your notebook task so that it didn't default to GIT.

View solution in original post

stevewb
New Contributor III

Thank you that worked! I spent several hours trying to work out what was going wrong there 😅