cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DAB YML Samples

Danny_Lee
Valued Contributor

Hi all,

I recently read this post and it was insightful for me because I had never seen the extension of an existing cluster by inheriting a previously defined cluster and then adding on top. 

Danny_Lee_0-1771543900750.png

It made me wonder whether others might also be interested in Databricks Asset Bundle YML file samples for different scenarios, especially more complex uses of the terraform variables and strategies for optimizing YMLs for more complex workloads.

----

On a side topic, unrelated to DABs, I was looking to see if I could filter the Data Engineering forum for a DAB or 'Asset Bundle' tag but it seems that I can't scroll through the Tags list on the sidebar.  Is this an issue on my machine only, or do others see the same?

Danny_Lee_1-1771544096230.png

Thanks,

Danny

 

--
The heart that breaks open can contain the whole universe. - Joanna Macy
1 ACCEPTED SOLUTION

Accepted Solutions

SteveOstrowski
Databricks Employee
Databricks Employee

@Danny_Lee

Great question -- and I think a lot of people will find this thread useful. There are actually quite a few official resources for DAB YAML samples and patterns. Let me round them up for you.

OFFICIAL BUNDLE EXAMPLES REPOSITORY

The best single resource is the official bundle-examples GitHub repo maintained by Databricks. It contains 16+ complete, working examples covering a wide range of scenarios:

https://github.com/databricks/bundle-examples

The knowledge_base folder in that repo has examples for:

- app_with_database -- A Databricks app backed by an OLTP Postgres database
- dashboard_nyc_taxi -- An AI/BI dashboard with a snapshot job
- database_with_catalog -- Defines an OLTP database instance and catalog
- databricks_app -- Defines a Databricks App
- development_cluster -- Defines and uses a development (all-purpose) cluster
- job_read_secret -- Defines a secret scope and a job that reads from it
- job_with_multiple_wheels -- A job with multiple wheel dependencies
- job_with_run_job_tasks -- Multiple jobs with run job tasks
- job_with_sql_notebook -- A job using a SQL notebook task
- pipeline_with_schema -- A Unity Catalog schema and pipeline that uses it
- private_wheel_packages -- Uses a private wheel package from a job
- python_wheel_poetry -- Builds a whl with Poetry
- serverless_job -- Uses serverless compute to run a job
- share_files_across_bundles -- Includes files from outside the bundle root
- spark_jar_task -- Defines and uses a Spark JAR task
- write_from_job_to_volume -- Writes a file to a UC volume

The repo also has full template examples (default_python, default_sql, lakeflow_pipelines_python, lakeflow_pipelines_sql, mlops_stacks, pydabs, and more).

DOCUMENTATION EXAMPLES PAGE

The docs have a dedicated page with inline YAML samples covering common configuration patterns such as JAR uploads to Unity Catalog, dashboard parameterization with variables, serverless jobs, multi-wheel dependencies, job parameters, requirements.txt integration, scheduled jobs with cron expressions, and serverless pipelines:

https://docs.databricks.com/en/dev-tools/bundles/examples.html

VARIABLES AND COMPLEX CONFIGURATIONS

Since you mentioned the complex variables pattern from the other post, here are the key docs for advanced variable usage:

Variables reference:

https://docs.databricks.com/en/dev-tools/bundles/variables.html

This covers:

- Defining custom variables with defaults and descriptions
- Complex variables using type: complex (for nested structures like full cluster definitions)
- Variable substitution syntax: ${var.my_variable}
- Precedence order: CLI flags > env vars > override files > target config > defaults
- Object lookups (resolve a cluster/warehouse/pipeline ID by name)

For example, you can define a complex variable for an entire cluster configuration and then reference it across multiple jobs -- which is the inheritance pattern you saw in that other post.

SPLITTING AND ORGANIZING YAML FOR COMPLEX WORKLOADS

For larger projects, the include mapping is your best friend. You can split your bundle configuration across multiple files:

# databricks.yml
bundle:
  name: my-project

include:
  - "resources/*.yml"
  - "targets/*.yml"

This lets you organize by concern -- one file per job, separate files for targets/environments, shared variable definitions, etc. The docs cover this here:

https://docs.databricks.com/en/dev-tools/bundles/settings.html

Key patterns for complex workloads:

1. Use include to split resources into separate files
2. Define shared settings at the top level and override per-target
3. Use complex variables to define reusable cluster configs or other nested objects
4. Use variable lookups to reference existing workspace resources by name
5. Use the override file (.databricks/bundle/<target>/variable-overrides.json) for environment-specific values without changing committed YAML

BUNDLE TEMPLATES

If you want to create standardized project structures for your team, you can build custom bundle templates:

https://docs.databricks.com/en/dev-tools/bundles/templates.html

Databricks also ships 7 built-in templates you can initialize with:

databricks bundle init

The built-in options are: default-minimal, default-python, default-scala, default-sql, dbt-sql, mlops-stacks, and pydabs.

VALIDATING YOUR CONFIGS

One tip for experimenting with complex YAML configurations -- you can validate and see the fully resolved configuration (with all variable substitutions and target overrides applied) by running:

databricks bundle validate --output json

This is very helpful for debugging inheritance and override behavior.

ABOUT THE TAG FILTERING

Regarding the sidebar tag filtering issue -- I can confirm that is not just your machine. The tag list in the sidebar does have usability limitations. Hopefully the community platform team can improve that over time.

I hope these resources help you and others build out more sophisticated DAB configurations. If you have a specific scenario you are trying to implement, feel free to post it and we can help with the YAML structure.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

View solution in original post

3 REPLIES 3

saurabh18cs
Honored Contributor III

i see the same seems tags are not used wisely, platform should auto fill tags using ai 🙂

SteveOstrowski
Databricks Employee
Databricks Employee

@Danny_Lee

Great question -- and I think a lot of people will find this thread useful. There are actually quite a few official resources for DAB YAML samples and patterns. Let me round them up for you.

OFFICIAL BUNDLE EXAMPLES REPOSITORY

The best single resource is the official bundle-examples GitHub repo maintained by Databricks. It contains 16+ complete, working examples covering a wide range of scenarios:

https://github.com/databricks/bundle-examples

The knowledge_base folder in that repo has examples for:

- app_with_database -- A Databricks app backed by an OLTP Postgres database
- dashboard_nyc_taxi -- An AI/BI dashboard with a snapshot job
- database_with_catalog -- Defines an OLTP database instance and catalog
- databricks_app -- Defines a Databricks App
- development_cluster -- Defines and uses a development (all-purpose) cluster
- job_read_secret -- Defines a secret scope and a job that reads from it
- job_with_multiple_wheels -- A job with multiple wheel dependencies
- job_with_run_job_tasks -- Multiple jobs with run job tasks
- job_with_sql_notebook -- A job using a SQL notebook task
- pipeline_with_schema -- A Unity Catalog schema and pipeline that uses it
- private_wheel_packages -- Uses a private wheel package from a job
- python_wheel_poetry -- Builds a whl with Poetry
- serverless_job -- Uses serverless compute to run a job
- share_files_across_bundles -- Includes files from outside the bundle root
- spark_jar_task -- Defines and uses a Spark JAR task
- write_from_job_to_volume -- Writes a file to a UC volume

The repo also has full template examples (default_python, default_sql, lakeflow_pipelines_python, lakeflow_pipelines_sql, mlops_stacks, pydabs, and more).

DOCUMENTATION EXAMPLES PAGE

The docs have a dedicated page with inline YAML samples covering common configuration patterns such as JAR uploads to Unity Catalog, dashboard parameterization with variables, serverless jobs, multi-wheel dependencies, job parameters, requirements.txt integration, scheduled jobs with cron expressions, and serverless pipelines:

https://docs.databricks.com/en/dev-tools/bundles/examples.html

VARIABLES AND COMPLEX CONFIGURATIONS

Since you mentioned the complex variables pattern from the other post, here are the key docs for advanced variable usage:

Variables reference:

https://docs.databricks.com/en/dev-tools/bundles/variables.html

This covers:

- Defining custom variables with defaults and descriptions
- Complex variables using type: complex (for nested structures like full cluster definitions)
- Variable substitution syntax: ${var.my_variable}
- Precedence order: CLI flags > env vars > override files > target config > defaults
- Object lookups (resolve a cluster/warehouse/pipeline ID by name)

For example, you can define a complex variable for an entire cluster configuration and then reference it across multiple jobs -- which is the inheritance pattern you saw in that other post.

SPLITTING AND ORGANIZING YAML FOR COMPLEX WORKLOADS

For larger projects, the include mapping is your best friend. You can split your bundle configuration across multiple files:

# databricks.yml
bundle:
  name: my-project

include:
  - "resources/*.yml"
  - "targets/*.yml"

This lets you organize by concern -- one file per job, separate files for targets/environments, shared variable definitions, etc. The docs cover this here:

https://docs.databricks.com/en/dev-tools/bundles/settings.html

Key patterns for complex workloads:

1. Use include to split resources into separate files
2. Define shared settings at the top level and override per-target
3. Use complex variables to define reusable cluster configs or other nested objects
4. Use variable lookups to reference existing workspace resources by name
5. Use the override file (.databricks/bundle/<target>/variable-overrides.json) for environment-specific values without changing committed YAML

BUNDLE TEMPLATES

If you want to create standardized project structures for your team, you can build custom bundle templates:

https://docs.databricks.com/en/dev-tools/bundles/templates.html

Databricks also ships 7 built-in templates you can initialize with:

databricks bundle init

The built-in options are: default-minimal, default-python, default-scala, default-sql, dbt-sql, mlops-stacks, and pydabs.

VALIDATING YOUR CONFIGS

One tip for experimenting with complex YAML configurations -- you can validate and see the fully resolved configuration (with all variable substitutions and target overrides applied) by running:

databricks bundle validate --output json

This is very helpful for debugging inheritance and override behavior.

ABOUT THE TAG FILTERING

Regarding the sidebar tag filtering issue -- I can confirm that is not just your machine. The tag list in the sidebar does have usability limitations. Hopefully the community platform team can improve that over time.

I hope these resources help you and others build out more sophisticated DAB configurations. If you have a specific scenario you are trying to implement, feel free to post it and we can help with the YAML structure.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

Danny_Lee
Valued Contributor

Fantastic @SteveOstrowski !  This is exactly the kind of resource that I was looking for - it lets me understand what's possible and gives me the kind of templates that I can use to get ahead of my work!  Thank you!

--
The heart that breaks open can contain the whole universe. - Joanna Macy