Unity Catalog and environment set up
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2025 03:34 AM
We are implementing the Databricks medallion architecture (bronze, silver, gold). We have three different environments/workspaces in Databricks: Dev, Test and Prod. Each catalog in Unity Catalog points to a specific place in the Azure Data Lake. It therefore seems that the (only?) solution then will be to name gold in dev 'gold_dev' and so on? That in turn means that we need to parameterize the env name and use this parameter that varies across environments in our code for the data/ml pipelines.
Example of such a solution:
import os
from pyspark.sql import SparkSession
env = os.getenv("ENV", "dev") # Default to 'dev' if not set
catalog_map = {
"dev": "bronze_dev",
"test": "bronze_test",
"prod": "bronze_prod"
}
bronze_catalog = catalog_map[env]
spark = SparkSession.builder.getOrCreate()
df = spark.read.table(f"{bronze_catalog}.schema.table_name")
Question: Is the preferred solution, or is it possible to do it in another way?
Note: I've noticed that some recommends using dev, test and prod as catalogs, however we likely need to have more flexibility than simply using gold, silver and bronze schemas. That is why we lift these components to the catalog level, so that we below this level in the hierarchy can define specific schemas within the gold, silver, and bronze catalog.