โ12-10-2025 01:51 AM
Hi all
We are using Databricks Asset Bundles for our data science / ML projects. The asset bundle we have, have spawned quite a few projects by now, but now we need to make some updates to the asset bundle. The updates should also be added to the spawned projects.
So my question is how to handle this? Is there a feature integrated with Databricks asset bundles or would we need to look in a different direction?
I know there are some tools which are compatible with cookiecutter templates, where you can update the cookiecutter template and then apply the changes on the spawn. However, i cant seem to find something that would make sense from a databricks asset bundle perspective. I think it is quite an issue honestly.
kind regards
Niels
โ12-10-2025 08:43 AM
Greetings @Sleiny ,
Hereโs whatโs really going on, plus a pragmatic, field-tested plan you can actually execute without tearing up your repo strategy.
Letโs dig in.
Whatโs happening
Databricks Asset Bundles templates are used at initialization time via databricks bundle initโeither from default templates or from your own custom ones. Theyโre great for standardizing how projects start. The key detail is that templates are explicitly positioned as one-time scaffolding. The docs cover how to create them, share them, and initialize bundles from themโbut there is no built-in mechanism to โre-applyโ template changes back onto projects that were already spawned.
Once a bundle exists, itโs just configuration in source controlโtypically YAML (or Python), with databricks.yml at the root. You can compose that configuration using include, along with other top-level constructs like git metadata, scripts, and sync behavior. This makes bundles modular, but again, that modularity is your responsibility to design.
For shared logic across many projects, the right abstraction is not copy-pasteโitโs centrally versioned libraries. Wheels, JARs, and PyPI packages can be referenced directly in bundle job tasks so that โcommon codeโ lives in one place instead of being scattered across a dozen repos.
Bundles also give you workflows like bundle generate and bundle deployment bind, but those are about keeping a single projectโs local configuration aligned with whatโs already deployed. They are not designed to propagate template evolution across multiple projects.
Implications
There is no native โcookiecutter updateโ equivalent for Databricks Asset Bundles. Template updates do not automatically fan out to existing projects. That said, bundles are git-first and composable, which means you can implement clean, scalable patterns that solve this problem in a very software-engineering-native way.
Recommended action plan
Separate shared code from per-project code using central libraries
Move shared DS/ML logicโtraining loops, utilities, feature engineering, common jobsโinto one or more versioned packages. Publish those as wheels or JARs into Unity Catalog volumes or workspace files. Then each bundle simply references the shared artifact. Updating behavior becomes a version bump plus a redeploy, not a repo-wide refactor.
Once you do this, drift essentially disappears at the code layer.
Externalize shared bundle configuration and include it
Create a small โorg-bundle-baseโ repo that holds your standard YAML fragments: compute presets, job conventions, cluster policies, security defaults, tagging, all of it. In each project, use include in databricks.yml to pull those fragments in. Manage that shared repo via a Git submodule or a pinned vendor path.
Now you have one place to edit common configurationโand updating a project becomes a simple submodule bump, validate, and deploy.
If your projects came from a template repo, use an upstream merge model
If your bundles were initialized from a Git-hosted template, treat those projects like forks. Add the template repo as an upstream remote and periodically merge or rebase in changes. The docs allow templates to come from Git URLs, but they donโt establish a persistent linkageโthis upstream model is how you operationalize template evolution in practice.
Automate propagation when scale kicks in
If youโre managing dozens of repos, you donโt want humans doing this manually. Script it.
Automate submodule bumps, shared package version updates, and config edits via CI (GitHub Actions, Azure DevOps, etc.). Have those workflows open PRs across repos. Gate everything with databricks bundle validate and lightweight smoke runs in dev before anything hits prod. This keeps your fleet consistent without centralizing everything into one monorepo.
Use generate/bind for drift controlโnot for template sync
When youโre updating definitions for jobs and pipelines that already exist in a workspace, bundle generate and bundle deployment bind are your safety rails. They help keep each projectโs local state aligned with deployed reality while youโre rolling out broader repo-level changes. Think of this as drift control, not template propagation.
Suggested next steps
First, inventory your changes and cleanly separate shared code from shared configuration.
Second, stand up a central shared-library pipeline and migrate projects to consume versioned artifacts.
Third, create your org-level base bundle repo and wire it in with include.
Finally, automate the update flow so this becomes routine instead of a quarterly fire drill.
Bottom line
There is no built-in Databricks Asset Bundles feature that automatically re-applies template changes to existing projects. The right solution is git-native: shared base repos via include, upstream merges for template-derived projects, and centrally versioned wheels and JARs for shared logic. Once you adopt those patterns, rolling updates becomes predictable, low-touch, and safe.
Hope this helps, Louis.
โ12-10-2025 08:43 AM
Greetings @Sleiny ,
Hereโs whatโs really going on, plus a pragmatic, field-tested plan you can actually execute without tearing up your repo strategy.
Letโs dig in.
Whatโs happening
Databricks Asset Bundles templates are used at initialization time via databricks bundle initโeither from default templates or from your own custom ones. Theyโre great for standardizing how projects start. The key detail is that templates are explicitly positioned as one-time scaffolding. The docs cover how to create them, share them, and initialize bundles from themโbut there is no built-in mechanism to โre-applyโ template changes back onto projects that were already spawned.
Once a bundle exists, itโs just configuration in source controlโtypically YAML (or Python), with databricks.yml at the root. You can compose that configuration using include, along with other top-level constructs like git metadata, scripts, and sync behavior. This makes bundles modular, but again, that modularity is your responsibility to design.
For shared logic across many projects, the right abstraction is not copy-pasteโitโs centrally versioned libraries. Wheels, JARs, and PyPI packages can be referenced directly in bundle job tasks so that โcommon codeโ lives in one place instead of being scattered across a dozen repos.
Bundles also give you workflows like bundle generate and bundle deployment bind, but those are about keeping a single projectโs local configuration aligned with whatโs already deployed. They are not designed to propagate template evolution across multiple projects.
Implications
There is no native โcookiecutter updateโ equivalent for Databricks Asset Bundles. Template updates do not automatically fan out to existing projects. That said, bundles are git-first and composable, which means you can implement clean, scalable patterns that solve this problem in a very software-engineering-native way.
Recommended action plan
Separate shared code from per-project code using central libraries
Move shared DS/ML logicโtraining loops, utilities, feature engineering, common jobsโinto one or more versioned packages. Publish those as wheels or JARs into Unity Catalog volumes or workspace files. Then each bundle simply references the shared artifact. Updating behavior becomes a version bump plus a redeploy, not a repo-wide refactor.
Once you do this, drift essentially disappears at the code layer.
Externalize shared bundle configuration and include it
Create a small โorg-bundle-baseโ repo that holds your standard YAML fragments: compute presets, job conventions, cluster policies, security defaults, tagging, all of it. In each project, use include in databricks.yml to pull those fragments in. Manage that shared repo via a Git submodule or a pinned vendor path.
Now you have one place to edit common configurationโand updating a project becomes a simple submodule bump, validate, and deploy.
If your projects came from a template repo, use an upstream merge model
If your bundles were initialized from a Git-hosted template, treat those projects like forks. Add the template repo as an upstream remote and periodically merge or rebase in changes. The docs allow templates to come from Git URLs, but they donโt establish a persistent linkageโthis upstream model is how you operationalize template evolution in practice.
Automate propagation when scale kicks in
If youโre managing dozens of repos, you donโt want humans doing this manually. Script it.
Automate submodule bumps, shared package version updates, and config edits via CI (GitHub Actions, Azure DevOps, etc.). Have those workflows open PRs across repos. Gate everything with databricks bundle validate and lightweight smoke runs in dev before anything hits prod. This keeps your fleet consistent without centralizing everything into one monorepo.
Use generate/bind for drift controlโnot for template sync
When youโre updating definitions for jobs and pipelines that already exist in a workspace, bundle generate and bundle deployment bind are your safety rails. They help keep each projectโs local state aligned with deployed reality while youโre rolling out broader repo-level changes. Think of this as drift control, not template propagation.
Suggested next steps
First, inventory your changes and cleanly separate shared code from shared configuration.
Second, stand up a central shared-library pipeline and migrate projects to consume versioned artifacts.
Third, create your org-level base bundle repo and wire it in with include.
Finally, automate the update flow so this becomes routine instead of a quarterly fire drill.
Bottom line
There is no built-in Databricks Asset Bundles feature that automatically re-applies template changes to existing projects. The right solution is git-native: shared base repos via include, upstream merges for template-derived projects, and centrally versioned wheels and JARs for shared logic. Once you adopt those patterns, rolling updates becomes predictable, low-touch, and safe.
Hope this helps, Louis.