- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2022 02:53 AM
Hi!
So I've been looking into trying Unity Catalog since it seems to add many great features.
But one thing I cant get my head around is the fact that we cant (shouldn't?) use multiple metastores in the same region in UC.
Let me explain my usecase:
We hava two environments development/production with one dbw each.
We are using meddalion architecture so our data is orginanized like:
bronze.source_system.dataset2
bronze.source_system.dataset1
Now what I want to do is to use this naming convention for all tables in UC, but thats not possible because tables stored in dev and prod will collide. And the solution to add a prefix / suffix somewhere in the table name is not very elegant imho.
We could do something like:
prod_bronze.source_system.dataset2
prod_bronze.source_system.dataset1
or
prod.bronze_source_system.dataset2
prod.bronze_source_system.dataset1
But then we need our codebase to keep track of which environment the code is being executed in to select the correct talbe in our pipeline tasks.
So what I would like to do is using one metastore per environment, which would also mitigate another issue for us: The fact that we have to store all managed tables in the same storage account even if they are created in different environments. That is really not an option for us, sure we can use external tables but that is still not great.
Thankfull for any input on this, how does your solution look when using UC in sandbox/dev/prod environments?
Thanks!
- Labels:
-
Unity Catalog