Yes, you can define environment-specific constants at the bundle level in Databricks Asset Bundles and make them accessible inside Databricks notebooks, without relying on task-level parameters. This can be done using environment variables, bundle configuration files, or by leveraging bundle parameters or environment spec features in Databricks Asset Bundles. Each approach has its own pros and cons regarding maintainability and accessibility.
Bundle-Level Constants in Databricks Asset Bundles
1. Using Bundle Config Files (bundle.yaml and environments Section)
Databricks Asset Bundles support environment-specific configuration in the bundle.yaml (or sometimes databricks.yml) file under the environments key. This allows you to define settings/variables per environment.
Example bundle.yaml:
environments:
dev:
variables:
gold_catalog: gold_dev01
uat:
variables:
gold_catalog: gold_tst01
prod:
variables:
gold_catalog: gold_prod01
These variables can then be accessed in Databricks jobs and workflows. To use them inside notebooks, you might need to pass them as environment variables or inject them via notebook initialization logic.
2. Accessing Bundle Variables in Notebooks
Inside your notebook, you can access environment variables using standard Python or Scala methods:
import os
gold_catalog = os.environ.get("gold_catalog", "default_value")
To inject these environment variables, you may need to ensure your workflow or job configuration pulls the variables from your environment spec, or you can use Databricks notebook widgets or parameters if environment variables cannot be set natively by the job.
3. Bundle Parameter Access Patterns
If you use bundle parameters, you can inject values into jobs or workflows defined in the bundle, and then reference them from notebooks by using widgets or parameter passing at runtime. However, since you want this at the bundle/environment level and accessible globally, prefer using global variables via the environments config or environment variables.
4. Best Practices
-
Use environments and variables inside bundle.yaml for environment-specific constants for clarity and isolation between targets.
-
Ensure you have a consistent mechanism in your job/notebook to read these variables (widgets, environment variables, or a library that parses bundle.yaml).
-
Document all environment-specific variables in a dedicated section in your asset bundle source code repository for maintainability.
-
For complex scenarios, consider using an initialization script that reads the environment and loads the correct constants automatically upon notebook startup.
Recommendations
-
Environment Variables via bundle.yaml: This is the most maintainable and native approach. You can define variables in the environments section and inject them into the runtime context using the Databricks Asset Bundles features.
-
Config File in DBFS or Workspace: You may also store a config file (JSON/YAML/PARQUET) with constants per environment, read it at notebook startup, and set globals accordingly.
-
No Task-Level Parameters Needed: By utilizing environments and variables at the bundle level, you avoid having to define job/task-level parameters, making your configuration cleaner and more scalable.
References
-
Databricks Asset Bundles: Parameters
-
Databricks Asset Bundles: Environments and configuration best practices
Summary:
Define environment-specific constants at the bundle level by leveraging the environments and variables keys in your bundle configuration file (bundle.yaml). Access these constants within notebooks through environment variables or by reading from a config file, avoiding the use of task-level parameters for global settings. This approach is recommended for maintainable, reusable deployment of Databricks notebooks across multiple environments.