02-26-2024 01:49 AM
I'm using DAB to deploy a "jobs" resource into Databeicks and into two environments: "dev" and "prod". I pull the notebooks from a remote git repository using "git_resource", and defined the default job to use a tag to find which version to pull. However, in deploying to "dev" I would like to use a branch instead of tag. This could be done in the target configuration for "dev" but the problem is that it will also keep the tag along with the branch, which fails the deployment. So, the question is how to ignore the tag where a branch is defined?
Here are a snippet of my config files:
default_job.yml:
resources:
jobs:
job_name:
name: "name"
...
git_source:
git_url: "<git_url>"
git_provider: "<provider>"
git_tag: "<tag>"
...
dev.yml:
targets:
dev:
mode: development
...
resources:
jobs:
job_name:
git_source:
git_branch: "<branch>"
This will end up into the following config:
git_source": {
""git_branch": "<branch>",
"git_provider": "<provider>",
"git_tag": "<tag>",
"git_url": "<git_url>"
}
02-28-2024 05:23 AM
Hi @Ariaa, It sounds like you’re working with Databricks Asset Bundles (DAB) to deploy jobs into different environments.
Let’s address your specific scenario.
In your configuration, you’ve defined a default job that uses a tag to determine which version of the notebooks to pull from your remote Git repository. However, for the “dev” environment, you’d like to use a branch instead of a tag. The challenge is that when you define a branch in the target configuration for “dev,” it still retains the tag, leading to deployment issues.
To achieve your desired behavior, you can modify your configuration as follows:
Default Job Configuration (default_job.yml):
git_tag
to pull the notebooks.Development Environment Configuration (dev.yml):
git_source
section for the specific job (let’s call it job_name
), set only the git_branch
without specifying the git_tag
.Here’s how your modified configuration would look:
# default_job.yml
resources:
jobs:
job_name:
name: "name"
...
git_source:
git_url: "<git_url>"
git_provider: "<provider>"
git_tag: "<tag>"
...
# dev.yml
targets:
dev:
mode: development
...
resources:
jobs:
job_name:
git_source:
git_branch: "<branch>"
With this setup, when deploying to the “dev” environment, Databricks will use the specified branch and ignore any associated tags. Your git_source
configuration for the “dev” environment will look like this:
"git_source": {
"git_branch": "<branch>",
"git_provider": "<provider>",
"git_url": "<git_url>"
}
Remember to replace <branch>
, <provider>
, and <git_url>
with the actual values relevant to your project.
Happy deploying! 🚀
02-28-2024 06:04 AM
Hi @Kaniz_Fatma and thanks for replying. How does your solution differ from mine? Unless I'm missing some points here the only difference is with indentation, which actually makes "resources" a new target!
05-17-2024 01:18 AM
I use target overrides to switch between branch and tags on different environments:
resources:
jobs:
my_job:
git_source:
git_url: <REPO-URL>
git_provider: gitHub
targets:
staging:
resources:
jobs:
my_job:
git_source:
# Use Git branch for staging deploys
git_branch: ${var.git_branch}
prod:
resources:
jobs:
my_job:
git_source:
# Use Git tag for prod deploys
git_tag: ${var.git_tag}
Discussion about that can be found here: https://github.com/databricks/cli/issues/1255
You need to repeat that for every job you define, which can be a pain if you have many jobs.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group