cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Job deploy with git source using Asset Bundles

jonhieb
New Contributor III

Hi, I'm trying to deploy a job with a notebook task based on a git source. But I'm facing an error when I try to deploy.

This is the YAML file:


resources:
jobs:
data_quality_pipelines_job:
name: schedule_data_quality_job

schedule:
quartz_cron_expression: "0 0 5 ? * Mon-Fri" # At 5:00:00am, every day between Monday and Friday, every month
timezone_id: "America/Sao_Paulo"

timeout_seconds: 7200 # 2 hours

git_source:
git_branch: main
git_provider: gitHub
git_url: https://github.com/xxxx/yyyyyyy

email_notifications:
on_failure:
- ${workspace.current_user.userName}
webhook_notifications:
on_failure:
- id: xxxxxx-xxxxxxx-xxxxxx

tasks:
- task_key: data_quality_task
pipeline_task:
pipeline_id: xxxxxxx-xxxxxxx-xxxxxxx
full_refresh: false

- task_key: notify_business_areas
depends_on:
- task_key: data_quality_task
notebook_task:
notebook_path: src/notifications/send_notifications.ipynb
max_retries: 0

parameters:
- name: notification_conf_file
default: ${var.notification_conf_file}

When I try to run it, this error appears:


Error: terraform apply: exit status 1

Error: cannot update job: Invalid notebook_path: src/notifications/send_notifications.ipynb. Only absolute paths are currently supported. Paths must begin with '/'.

with databricks_job.kpis_analytics_job,
on bundle.tf.json line 79, in resource.databricks_job.kpis_analytics_job:
79: }

If I try to add a '/', the tool will search at my Databricks Workspace, but I want to consume the notebook based on git. I know that the path exists on my git URL.

What should I do?

1 ACCEPTED SOLUTION

Accepted Solutions

jonhieb
New Contributor III

I found the solution for this one:

When we’re running a job that needs to consume files from a Git repo, in addition to declaring the git_source clause, we also need to declare the source clause within the task configuration.

The image below demonstrates an example:

jonhieb_0-1744037232118.png

Notice that inside the notebook_task clause, I declared the source. That was enough to make it work.

View solution in original post

1 REPLY 1

jonhieb
New Contributor III

I found the solution for this one:

When we’re running a job that needs to consume files from a Git repo, in addition to declaring the git_source clause, we also need to declare the source clause within the task configuration.

The image below demonstrates an example:

jonhieb_0-1744037232118.png

Notice that inside the notebook_task clause, I declared the source. That was enough to make it work.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now