Hello @Kjetil,
Your proposed solution of parameterizing the environment name and using this parameter in your code for the data/ML pipelines is a valid approach. This method allows you to dynamically select the appropriate catalog based on the environment, ensuring that your code can run seamlessly across different environments (Dev, Test, Prod).
However, there is an alternative approach that you might consider. Instead of naming the catalogs as gold_dev, gold_test, and gold_prod, you could use the environment names directly as catalog names (e.g., dev, test, prod). This approach is recommended by some because it simplifies the naming convention and makes it clear which environment you are working in.
import os
from pyspark.sql import SparkSession
env = os.getenv("ENV", "dev") # Default to 'dev' if not set
catalog_map = {
"dev": "dev",
"test": "test",
"prod": "prod"
}
catalog = catalog_map[env]
spark = SparkSession.builder.getOrCreate()
df = spark.read.table(f"{catalog}.schema.table_name")