Databricks Community

AdamMcGuinness · ‎08-22-2023

Looking at Databricks’ suggested use of catalogs. My instincts are now leading me to the conclusion having separate metastore for each SDLC environment (dev, test, prod) is preferable. I think if this pattern were followed, this means due to current constraints, a separate account for each environment is required as we would not want to be in different regions for the same account. This approach yields the full benefits of a three-level namespace as you are not giving up the top level to an environment as per this "best practice"
https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/best-practices#--or...

My rationale:

By dedicating a catalog to an environment, you do not get the full benefit of the three-level namespace e.g. for a source dataset:

Catalog - bronze_systemA (One catalog dedicated to a source in the environment’s metastore)
          Schema – raw
                    Schema child objects
          Schema – historised (optional if you need to collect time series data from source)
                    Schema child objects
          Schema – curated (optional curation of source data without aggregating to other sources as you would in silver.)
                    Schema child objects

Is better than:

Catalog - bronze_all_systems_dev (one catalog dedicated to all sources by environment in same metastore)
Schema - systemA_raw
Schema child objects
Schema - systemA_historised
Schema child objects
Schema - systemA_curated
Schema child objects
Many more schemas

“
“

“

On platform deployments from lower to higher environments would not have to manage the change in catalog name where an object is referenced e.g. A view’s SQL definition:
…..
FROM bronze_systemA.raw.table_abc

Is better than:
…..
FROM bronze_all_systems_dev.systemA_raw.table_abc

When deploying to higher environments “_dev” needs to change.

I anticipate this may also apply to other objects such as:
   Workflows
   DLT
   Jobs
   Maybe more ...
An external connection in an external tool will only have change connection string for the higher environment and not catalog name.
Binding of catalogs to workspaces provides a clean method to manage data access compared to cherry picking schemas into ACLs and associating with authorised users.

Interested if I have missed something and other points of view.

Thanks

-werners- · ‎08-23-2023

https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/best-practices#--or...

So basically Databricks advises one metastore for multiple envs.

AdamMcGuinness · ‎08-23-2023

Yes I am aware of that. I'm not convinced this is a "best practice".

It means that if you stay in the same metastore, use catalogs to divide up your environments as Databricks show, you have to deal with a changing three level namespace. You really you only get a two level name space as you have given the top level away to an environment.

My main concern is dealing with deploying objects from lower to higher environments that have to deal with the changing namespace. Not only on platform, but for external tools as well.

I am wondering how others are dealing with that?

jameshughes · ‎06-14-2025

We had this same dilemma, and ended up leveraging Secrets in the Key Vault Backed Secret Store and dynamic notebooks to determine which "environment" we were in. Based upon that, the notebook would set variables to properly set the appropriate catalog.

import os

DEFAULT_SECRET_SCOPE = os.environ.get("DEFAULT_SECRET_SCOPE")

# Check to see if the default secret scope has been configured
if DEFAULT_SECRET_SCOPE == None or len(DEFAULT_SECRET_SCOPE) == 0:
    raise Exception("A default secret scope has not been configured on the selected cluster.")

# Retrieve the environment name to be used to determine which catalog to operate in
ENVIRONMENT = dbutils.secrets.get(scope=DEFAULT_SECRET_SCOPE, key="environment-name")

CATALOG_NAME = f"{ENVIRONMENT}_bronze"

-werners- · ‎08-23-2023

I understand your concern.
However, changing the catalog name while deploying can be handled by putting the 'environment' in external config and updating that while deploying.
If you want strictly separated envs, having one catalog per env is an option but I am not sure if that is even possible using Unity for the moment. AFAIK you can only have one metastore per region.
Perhaps that will change in the future.
So for the moment you are stuck with workspace-catalog binding and using a variable env name.

SSundaram · ‎11-30-2023

You can create multiple metastores for each region within an account. This is not a hard constraint, reach out to account team and they can make an exception. Before doing that, consider what kind of securable sharing you will need between dev, test and prod (on different metastores). Some data science use cases will need a different sharing needs than data engineering use cases.

Andrius · ‎04-11-2025

Glad to see its not just me thinking about multiple metastores. Separate metastores by environment makes total sense. This would have complete isolation between environments, also in your dev, stg, prod you can reuse catalog names without having to use prefix or something along those lines. also if you want to maintain physical storage separation between env you can do this at metastore level. Did anyone implement this? Keen to hear their experience and what could be the limitations of such setup as right now cant think of any

jameshughes · ‎06-14-2025

We went with the single metastore as we ran some experiments with multiple and ran into issues especially when dealing with Unity Catalog lineage and needing to back copying data from higher to lower environments to run various load testing scenarios. Ended up storing environment configuration information in Key Vault and reading at runtime from notebooks.

import os

DEFAULT_SECRET_SCOPE = os.environ.get("DEFAULT_SECRET_SCOPE")

# Check to see if the default secret scope has been configured
if DEFAULT_SECRET_SCOPE == None or len(DEFAULT_SECRET_SCOPE) == 0:
    raise Exception("A default secret scope has not been configured on the selected cluster.")

# Retrieve the environment name to be used to determine which catalog to operate in
ENVIRONMENT = dbutils.secrets.get(scope=DEFAULT_SECRET_SCOPE, key="environment-name")

CATALOG_NAME = f"{ENVIRONMENT}_bronze"

Coffee77 · ‎11-06-2025

Same approach used from my end. We add a suffix such as "_dev", "_qa" or "_stg" to the catalog names in a shared python function called something similar to "get-catalog-name(name)". That way, all references to delta tables must make use of that function that selects suffix based on an environment variable called "Environment".

In my case, all catalogs are filtered by workspace via Unity Catalog so that from "dev" is not possible access to "qa", etc.. It means, I'm not using "secrets" to store environment names. Not sure if I'm missing something here so that usage of secrets is still needed. Is this right or should I consider some hidden security problem? Thanks!

Lifelong Solution Architect Learner | Coffee & Data

Databricks Community

Metastore - One per Account/Region Limitation

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples