cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to overwrite git_source configuration in Asset Bundles

Ariaa
New Contributor II

I'm using DAB to deploy a "jobs" resource into Databeicks and into  two environments: "dev" and "prod". I pull the notebooks from a remote git repository using "git_resource", and defined the default job to use a tag to find which version to pull. However, in deploying to "dev" I would like to use a branch instead of tag. This could be done in the target configuration for "dev" but the problem is that it will also keep the tag along with the branch, which fails the deployment. So, the question is how to ignore the tag where a branch is defined? 
Here are a snippet of my config files:
default_job.yml:

resources:
  jobs:
    job_name:
      name: "name"
      
      ...

      git_source:
        git_url: "<git_url>"
        git_provider: "<provider>"
        git_tag: "<tag>"

      ...

dev.yml:

targets:
  dev: 
    mode: development

  ...

   resources:
     jobs:
       job_name:
       git_source:
         git_branch: "<branch>"

This will end up into the following config:

git_source": {
          ""git_branch": "<branch>",
          "git_provider": "<provider>",
          "git_tag": "<tag>",
          "git_url": "<git_url>"
        }

 

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @AriaaIt sounds like youโ€™re working with Databricks Asset Bundles (DAB) to deploy jobs into different environments.

Letโ€™s address your specific scenario.

In your configuration, youโ€™ve defined a default job that uses a tag to determine which version of the notebooks to pull from your remote Git repository. However, for the โ€œdevโ€ environment, youโ€™d like to use a branch instead of a tag. The challenge is that when you define a branch in the target configuration for โ€œdev,โ€ it still retains the tag, leading to deployment issues.

To achieve your desired behavior, you can modify your configuration as follows:

  1. Default Job Configuration (default_job.yml):

    • Keep the existing configuration as it is, using the git_tag to pull the notebooks.
    • This configuration will be used for all environments except โ€œdev.โ€
  2. Development Environment Configuration (dev.yml):

    • In the git_source section for the specific job (letโ€™s call it job_name), set only the git_branch without specifying the git_tag.
    • This ensures that when deploying to the โ€œdevโ€ environment, the tag is ignored, and only the specified branch is used.

Hereโ€™s how your modified configuration would look:

# default_job.yml
resources:
  jobs:
    job_name:
      name: "name"
      
      ...

      git_source:
        git_url: "<git_url>"
        git_provider: "<provider>"
        git_tag: "<tag>"

      ...

# dev.yml
targets:
  dev:
    mode: development

  ...

  resources:
    jobs:
      job_name:
        git_source:
          git_branch: "<branch>"

With this setup, when deploying to the โ€œdevโ€ environment, Databricks will use the specified branch and ignore any associated tags. Your git_source configuration for the โ€œdevโ€ environment will look like this:

"git_source": {
  "git_branch": "<branch>",
  "git_provider": "<provider>",
  "git_url": "<git_url>"
}

Remember to replace <branch>, <provider>, and <git_url> with the actual values relevant to your project.

Happy deploying! ๐Ÿš€

Ariaa
New Contributor II

Hi @Kaniz_Fatma  and thanks for replying. How does your solution differ from mine? Unless I'm missing some points here the only difference is with indentation, which actually makes "resources" a new target!

Husky
New Contributor III

I use target overrides to switch between branch and tags on different environments:

 

resources:
  jobs:
    my_job:
      git_source:
        git_url: <REPO-URL>
        git_provider: gitHub

targets:
  staging:
    resources:
      jobs:
        my_job:
          git_source:
            # Use Git branch for staging deploys 
            git_branch: ${var.git_branch}

  prod:
    resources:
      jobs:
        my_job:
          git_source:
            # Use Git tag for prod deploys 
            git_tag: ${var.git_tag}

 

 Discussion about that can be found here: https://github.com/databricks/cli/issues/1255

You need to repeat that for every job you define, which can be a pain if you have many jobs.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group