03-11-2024 09:58 PM
In my organization, we are using Databricks unity catalog and we have a metastore created for our region which holds all of our workspaces. When we created the metastore last year, we set a metastore root location for it (If I remember correctly, metastore root path was not optional at that time, I could be wrong)
With this setup, we have created many catalogs, external locations, user assignments in our 20+ workspaces. None of our catalogs use the metastore root path for managed tables. All of the catalogs have their own storage locations for managed tables. We want our admins to follow this practice of using separate storage paths for each catalog, but it is not enforced. They could make a mistake and use the metastore root for their catalogs.
Now we see the option to create a metastores without a root storage in unity catalog. This is ideal for us as this would now force admins to always define a separate storage path whenever they create a catalog.
But we are not sure on how to implement this in our existing metastore. Apparently we can't remove the root storage from the metastore now. We can delete the metastore, create a new one and then move all the workspaces into the new one. But not sure if this would break all user assignments we have done so far the existing catalogs, external locations, workspaces, etc.
Could anyone help us in finding a solution here.
Thanks in advance
03-12-2024 02:59 AM
Hello,
What I did was that I isolated Root Storage from Workspaces (you can't reach storage from Network Perspective and Authentication Perspective) - it is ugly but it works.
What you could try is to do that via API (like update metastore to NULL storage root) - sometimes I was able to achieve more using API then UI (even not documented things ) .
I would avoid migrating entire metastore unless your entire UC is base on code (all Create statement for UC Obejcts)
05-01-2024 10:04 PM
Hi @Wojciech_BUK
Thanks for your comment. It is exactly what we have right now. Metastore root storage is not reachable from any of the workspaces, so cannot put any data there. But still people could mistakenly still create the catalogs using that. I guess we'll work with that for now.
Thanks for your suggestion on using APIs. Worth taking a look into.
Thanks.
05-23-2024 11:20 AM
If you have a way to reproduce metastore object creation then better to delete this metastore a create a new one with no root metastore bucket. That will also ensure that catalog cannot be created without specifying a location. If you have locked down your root metastore bucket location then that can't be used a catalog managed location as you will not be able to reach that external location.
Friday
Does removing the ADLS path metastore location affect? the ADLS path tied to a project and if people from other projects wanted to use the metastore for catalog creation. We do not want to incur any payments for their data storage or processing. I believe only one metastore is allowed for a tenant in a region. How could this be solved.
Friday
Are you using the same service principal to configure connections to external locations or service principal for "default" ADLS is different? If different, remove contributor roles from it to access default ADLS and you can be sure catalogs will not be ever created 🙂 Maybe this can be a workaround... On the other hand, if you already have catalogs in default ADLS you can explore using "deep clone" to clone your tables to another one.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now