โ08-22-2023 11:47 PM
Looking at Databricksโ suggested use of catalogs. My instincts are now leading me to the conclusion having separate metastore for each SDLC environment (dev, test, prod) is preferable. I think if this pattern were followed, this means due to current constraints, a separate account for each environment is required as we would not want to be in different regions for the same account. This approach yields the full benefits of a three-level namespace as you are not giving up the top level to an environment as per this "best practice"
https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/best-practices#--or...
My rationale:
โ
โ
โ
Interested if I have missed something and other points of view.
Thanks
โ08-23-2023 01:16 AM
So basically Databricks advises one metastore for multiple envs.
โ08-23-2023 03:46 PM
Yes I am aware of that. I'm not convinced this is a "best practice".
It means that if you stay in the same metastore, use catalogs to divide up your environments as Databricks show, you have to deal with a changing three level namespace. You really you only get a two level name space as you have given the top level away to an environment.
My main concern is dealing with deploying objects from lower to higher environments that have to deal with the changing namespace. Not only on platform, but for external tools as well.
I am wondering how others are dealing with that?
โ08-23-2023 11:55 PM
I understand your concern.
However, changing the catalog name while deploying can be handled by putting the 'environment' in external config and updating that while deploying.
If you want strictly separated envs, having one catalog per env is an option but I am not sure if that is even possible using Unity for the moment. AFAIK you can only have one metastore per region.
Perhaps that will change in the future.
So for the moment you are stuck with workspace-catalog binding and using a variable env name.
โ11-30-2023 03:57 PM
You can create multiple metastores for each region within an account. This is not a hard constraint, reach out to account team and they can make an exception. Before doing that, consider what kind of securable sharing you will need between dev, test and prod (on different metastores). Some data science use cases will need a different sharing needs than data engineering use cases.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group