cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to overwrite git_source configuration in Asset Bundles

Ariaa
New Contributor II

I'm using DAB to deploy a "jobs" resource into Databeicks and into  two environments: "dev" and "prod". I pull the notebooks from a remote git repository using "git_resource", and defined the default job to use a tag to find which version to pull. However, in deploying to "dev" I would like to use a branch instead of tag. This could be done in the target configuration for "dev" but the problem is that it will also keep the tag along with the branch, which fails the deployment. So, the question is how to ignore the tag where a branch is defined? 
Here are a snippet of my config files:
default_job.yml:

resources:
  jobs:
    job_name:
      name: "name"
      
      ...

      git_source:
        git_url: "<git_url>"
        git_provider: "<provider>"
        git_tag: "<tag>"

      ...

dev.yml:

targets:
  dev: 
    mode: development

  ...

   resources:
     jobs:
       job_name:
       git_source:
         git_branch: "<branch>"

This will end up into the following config:

git_source": {
          ""git_branch": "<branch>",
          "git_provider": "<provider>",
          "git_tag": "<tag>",
          "git_url": "<git_url>"
        }

 

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @AriaaIt sounds like you’re working with Databricks Asset Bundles (DAB) to deploy jobs into different environments.

Let’s address your specific scenario.

In your configuration, you’ve defined a default job that uses a tag to determine which version of the notebooks to pull from your remote Git repository. However, for the “dev” environment, you’d like to use a branch instead of a tag. The challenge is that when you define a branch in the target configuration for “dev,” it still retains the tag, leading to deployment issues.

To achieve your desired behavior, you can modify your configuration as follows:

  1. Default Job Configuration (default_job.yml):

    • Keep the existing configuration as it is, using the git_tag to pull the notebooks.
    • This configuration will be used for all environments except “dev.”
  2. Development Environment Configuration (dev.yml):

    • In the git_source section for the specific job (let’s call it job_name), set only the git_branch without specifying the git_tag.
    • This ensures that when deploying to the “dev” environment, the tag is ignored, and only the specified branch is used.

Here’s how your modified configuration would look:

# default_job.yml
resources:
  jobs:
    job_name:
      name: "name"
      
      ...

      git_source:
        git_url: "<git_url>"
        git_provider: "<provider>"
        git_tag: "<tag>"

      ...

# dev.yml
targets:
  dev:
    mode: development

  ...

  resources:
    jobs:
      job_name:
        git_source:
          git_branch: "<branch>"

With this setup, when deploying to the “dev” environment, Databricks will use the specified branch and ignore any associated tags. Your git_source configuration for the “dev” environment will look like this:

"git_source": {
  "git_branch": "<branch>",
  "git_provider": "<provider>",
  "git_url": "<git_url>"
}

Remember to replace <branch>, <provider>, and <git_url> with the actual values relevant to your project.

Happy deploying! 🚀

Ariaa
New Contributor II

Hi @Kaniz_Fatma  and thanks for replying. How does your solution differ from mine? Unless I'm missing some points here the only difference is with indentation, which actually makes "resources" a new target!

Husky
New Contributor III

I use target overrides to switch between branch and tags on different environments:

 

resources:
  jobs:
    my_job:
      git_source:
        git_url: <REPO-URL>
        git_provider: gitHub

targets:
  staging:
    resources:
      jobs:
        my_job:
          git_source:
            # Use Git branch for staging deploys 
            git_branch: ${var.git_branch}

  prod:
    resources:
      jobs:
        my_job:
          git_source:
            # Use Git tag for prod deploys 
            git_tag: ${var.git_tag}

 

 Discussion about that can be found here: https://github.com/databricks/cli/issues/1255

You need to repeat that for every job you define, which can be a pain if you have many jobs.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!