02-19-2026 03:35 PM
Hi all,
I recently read this post and it was insightful for me because I had never seen the extension of an existing cluster by inheriting a previously defined cluster and then adding on top.
It made me wonder whether others might also be interested in Databricks Asset Bundle YML file samples for different scenarios, especially more complex uses of the terraform variables and strategies for optimizing YMLs for more complex workloads.
----
On a side topic, unrelated to DABs, I was looking to see if I could filter the Data Engineering forum for a DAB or 'Asset Bundle' tag but it seems that I can't scroll through the Tags list on the sidebar. Is this an issue on my machine only, or do others see the same?
Thanks,
Danny
2 weeks ago
Great question -- and I think a lot of people will find this thread useful. There are actually quite a few official resources for DAB YAML samples and patterns. Let me round them up for you.
OFFICIAL BUNDLE EXAMPLES REPOSITORY
The best single resource is the official bundle-examples GitHub repo maintained by Databricks. It contains 16+ complete, working examples covering a wide range of scenarios:
https://github.com/databricks/bundle-examples
The knowledge_base folder in that repo has examples for:
- app_with_database -- A Databricks app backed by an OLTP Postgres database - dashboard_nyc_taxi -- An AI/BI dashboard with a snapshot job - database_with_catalog -- Defines an OLTP database instance and catalog - databricks_app -- Defines a Databricks App - development_cluster -- Defines and uses a development (all-purpose) cluster - job_read_secret -- Defines a secret scope and a job that reads from it - job_with_multiple_wheels -- A job with multiple wheel dependencies - job_with_run_job_tasks -- Multiple jobs with run job tasks - job_with_sql_notebook -- A job using a SQL notebook task - pipeline_with_schema -- A Unity Catalog schema and pipeline that uses it - private_wheel_packages -- Uses a private wheel package from a job - python_wheel_poetry -- Builds a whl with Poetry - serverless_job -- Uses serverless compute to run a job - share_files_across_bundles -- Includes files from outside the bundle root - spark_jar_task -- Defines and uses a Spark JAR task - write_from_job_to_volume -- Writes a file to a UC volume
The repo also has full template examples (default_python, default_sql, lakeflow_pipelines_python, lakeflow_pipelines_sql, mlops_stacks, pydabs, and more).
DOCUMENTATION EXAMPLES PAGE
The docs have a dedicated page with inline YAML samples covering common configuration patterns such as JAR uploads to Unity Catalog, dashboard parameterization with variables, serverless jobs, multi-wheel dependencies, job parameters, requirements.txt integration, scheduled jobs with cron expressions, and serverless pipelines:
https://docs.databricks.com/en/dev-tools/bundles/examples.html
VARIABLES AND COMPLEX CONFIGURATIONS
Since you mentioned the complex variables pattern from the other post, here are the key docs for advanced variable usage:
Variables reference:
https://docs.databricks.com/en/dev-tools/bundles/variables.html
This covers:
- Defining custom variables with defaults and descriptions
- Complex variables using type: complex (for nested structures like full cluster definitions)
- Variable substitution syntax: ${var.my_variable}
- Precedence order: CLI flags > env vars > override files > target config > defaults
- Object lookups (resolve a cluster/warehouse/pipeline ID by name)
For example, you can define a complex variable for an entire cluster configuration and then reference it across multiple jobs -- which is the inheritance pattern you saw in that other post.
SPLITTING AND ORGANIZING YAML FOR COMPLEX WORKLOADS
For larger projects, the include mapping is your best friend. You can split your bundle configuration across multiple files:
# databricks.yml bundle: name: my-project include: - "resources/*.yml" - "targets/*.yml"
This lets you organize by concern -- one file per job, separate files for targets/environments, shared variable definitions, etc. The docs cover this here:
https://docs.databricks.com/en/dev-tools/bundles/settings.html
Key patterns for complex workloads:
1. Use include to split resources into separate files 2. Define shared settings at the top level and override per-target 3. Use complex variables to define reusable cluster configs or other nested objects 4. Use variable lookups to reference existing workspace resources by name 5. Use the override file (.databricks/bundle/<target>/variable-overrides.json) for environment-specific values without changing committed YAML
BUNDLE TEMPLATES
If you want to create standardized project structures for your team, you can build custom bundle templates:
https://docs.databricks.com/en/dev-tools/bundles/templates.html
Databricks also ships 7 built-in templates you can initialize with:
databricks bundle init
The built-in options are: default-minimal, default-python, default-scala, default-sql, dbt-sql, mlops-stacks, and pydabs.
VALIDATING YOUR CONFIGS
One tip for experimenting with complex YAML configurations -- you can validate and see the fully resolved configuration (with all variable substitutions and target overrides applied) by running:
databricks bundle validate --output json
This is very helpful for debugging inheritance and override behavior.
ABOUT THE TAG FILTERING
Regarding the sidebar tag filtering issue -- I can confirm that is not just your machine. The tag list in the sidebar does have usability limitations. Hopefully the community platform team can improve that over time.
I hope these resources help you and others build out more sophisticated DAB configurations. If you have a specific scenario you are trying to implement, feel free to post it and we can help with the YAML structure.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
02-20-2026 07:58 AM
i see the same seems tags are not used wisely, platform should auto fill tags using ai 🙂
2 weeks ago
Great question -- and I think a lot of people will find this thread useful. There are actually quite a few official resources for DAB YAML samples and patterns. Let me round them up for you.
OFFICIAL BUNDLE EXAMPLES REPOSITORY
The best single resource is the official bundle-examples GitHub repo maintained by Databricks. It contains 16+ complete, working examples covering a wide range of scenarios:
https://github.com/databricks/bundle-examples
The knowledge_base folder in that repo has examples for:
- app_with_database -- A Databricks app backed by an OLTP Postgres database - dashboard_nyc_taxi -- An AI/BI dashboard with a snapshot job - database_with_catalog -- Defines an OLTP database instance and catalog - databricks_app -- Defines a Databricks App - development_cluster -- Defines and uses a development (all-purpose) cluster - job_read_secret -- Defines a secret scope and a job that reads from it - job_with_multiple_wheels -- A job with multiple wheel dependencies - job_with_run_job_tasks -- Multiple jobs with run job tasks - job_with_sql_notebook -- A job using a SQL notebook task - pipeline_with_schema -- A Unity Catalog schema and pipeline that uses it - private_wheel_packages -- Uses a private wheel package from a job - python_wheel_poetry -- Builds a whl with Poetry - serverless_job -- Uses serverless compute to run a job - share_files_across_bundles -- Includes files from outside the bundle root - spark_jar_task -- Defines and uses a Spark JAR task - write_from_job_to_volume -- Writes a file to a UC volume
The repo also has full template examples (default_python, default_sql, lakeflow_pipelines_python, lakeflow_pipelines_sql, mlops_stacks, pydabs, and more).
DOCUMENTATION EXAMPLES PAGE
The docs have a dedicated page with inline YAML samples covering common configuration patterns such as JAR uploads to Unity Catalog, dashboard parameterization with variables, serverless jobs, multi-wheel dependencies, job parameters, requirements.txt integration, scheduled jobs with cron expressions, and serverless pipelines:
https://docs.databricks.com/en/dev-tools/bundles/examples.html
VARIABLES AND COMPLEX CONFIGURATIONS
Since you mentioned the complex variables pattern from the other post, here are the key docs for advanced variable usage:
Variables reference:
https://docs.databricks.com/en/dev-tools/bundles/variables.html
This covers:
- Defining custom variables with defaults and descriptions
- Complex variables using type: complex (for nested structures like full cluster definitions)
- Variable substitution syntax: ${var.my_variable}
- Precedence order: CLI flags > env vars > override files > target config > defaults
- Object lookups (resolve a cluster/warehouse/pipeline ID by name)
For example, you can define a complex variable for an entire cluster configuration and then reference it across multiple jobs -- which is the inheritance pattern you saw in that other post.
SPLITTING AND ORGANIZING YAML FOR COMPLEX WORKLOADS
For larger projects, the include mapping is your best friend. You can split your bundle configuration across multiple files:
# databricks.yml bundle: name: my-project include: - "resources/*.yml" - "targets/*.yml"
This lets you organize by concern -- one file per job, separate files for targets/environments, shared variable definitions, etc. The docs cover this here:
https://docs.databricks.com/en/dev-tools/bundles/settings.html
Key patterns for complex workloads:
1. Use include to split resources into separate files 2. Define shared settings at the top level and override per-target 3. Use complex variables to define reusable cluster configs or other nested objects 4. Use variable lookups to reference existing workspace resources by name 5. Use the override file (.databricks/bundle/<target>/variable-overrides.json) for environment-specific values without changing committed YAML
BUNDLE TEMPLATES
If you want to create standardized project structures for your team, you can build custom bundle templates:
https://docs.databricks.com/en/dev-tools/bundles/templates.html
Databricks also ships 7 built-in templates you can initialize with:
databricks bundle init
The built-in options are: default-minimal, default-python, default-scala, default-sql, dbt-sql, mlops-stacks, and pydabs.
VALIDATING YOUR CONFIGS
One tip for experimenting with complex YAML configurations -- you can validate and see the fully resolved configuration (with all variable substitutions and target overrides applied) by running:
databricks bundle validate --output json
This is very helpful for debugging inheritance and override behavior.
ABOUT THE TAG FILTERING
Regarding the sidebar tag filtering issue -- I can confirm that is not just your machine. The tag list in the sidebar does have usability limitations. Hopefully the community platform team can improve that over time.
I hope these resources help you and others build out more sophisticated DAB configurations. If you have a specific scenario you are trying to implement, feel free to post it and we can help with the YAML structure.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.
2 weeks ago
Fantastic @SteveOstrowski ! This is exactly the kind of resource that I was looking for - it lets me understand what's possible and gives me the kind of templates that I can use to get ahead of my work! Thank you!