Unity Catalog and environment set up

Kjetil
Contributor

We are implementing the Databricks medallion architecture (bronze, silver, gold). We have three different environments/workspaces in Databricks: Dev, Test and Prod. Each catalog in Unity Catalog points to a specific place in the Azure Data Lake. It therefore seems that the (only?) solution then will be to name gold in dev 'gold_dev' and so on? That in turn means that we need to parameterize the env name and use this parameter that varies across environments in our code for the data/ml pipelines. 

Example of such a solution:

 

import os
from pyspark.sql import SparkSession

env = os.getenv("ENV", "dev")  # Default to 'dev' if not set
catalog_map = {
    "dev": "bronze_dev",
    "test": "bronze_test",
    "prod": "bronze_prod"
}
bronze_catalog = catalog_map[env]
spark = SparkSession.builder.getOrCreate()
df = spark.read.table(f"{bronze_catalog}.schema.table_name")

 

Question: Is the preferred solution, or is it possible to do it in another way?

Note: I've noticed that some recommends using dev, test and prod as catalogs, however we likely need to have more flexibility than simply using gold, silver and bronze schemas. That is why we lift these components to the catalog level, so that we below this level in the hierarchy can define specific schemas within the gold, silver, and bronze catalog.