Wednesday
I came to know that our unity catalog meta store has been created in the default storage account of our databricks workspace and this storage account has some system denied access policies, therefore we don't have access to see the data inside. I'm wondering, if this would be a security risk for us if we store production data in it? If yes, what shall we do?
Thanks,
Fatima
Wednesday
Storing production data in the default storage account of your Databricks workspace, which has system-denied access policies, could indeed pose a security risk. This is because the system-denied access policies prevent you from seeing the data inside, which can hinder your ability to manage and secure the data effectively.
To mitigate this risk, you should consider the following steps:
Use External Locations: Databricks recommends using external locations rather than relying on the default storage account. External locations allow you to define storage credentials and manage access control policies more effectively. This ensures that you have the necessary visibility and control over your data.
Configure Managed Storage Locations: You can configure managed storage locations at the catalog or schema level, overriding the default metastore storage location. This allows you to isolate storage for managed tables and ensure that access controls are properly enforced.
Wednesday
The external locations Walter mentioned are configured as the "storage location" at either the catalog or schema level. We set these for each schema in our lakehouse, it makes for a very clean physical implementation.
Thursday
Thank you both for the reply. However, my question is: Unity Catalog is not just catalogs, schemas, and tables, right? Even if we configure the desired external location for these, other things are still being stored in the metastore, which itself is in an undesired location. The issue I see is that the storage location of the metastore is not modifiable. It seems we need to create a new metastore and remove the old one. I'm wondering how we can do this without losing data.
Thanks for your attention!
Thursday
> Unity Catalog is not just catalogs, schemas, and tables, right?
It kind of is, but also not. Unity Catalog is mainly metadata (table structures, column types, etc.) and user permissions. Unity Catalog is not data, unless you've done something unusual with your data location configuration. You can think of Unity Catalog mainly as configuration.
> The issue I see is that the storage location of the metastore is not modifiable. It seems we need to create a new metastore and remove the old one.
Correct, you cannot modify the storage location. You would need to export the metadata, delete the existing metastore, create a new metastore with the desired configuration, then import the metadata. There is a migration tool which can apparently migrate metastores, I haven't tried it but it's at GitHub - databrickslabs/migrate: Old scripts for one-off ST-to-E2 migrations. Use "terraform exporte....
> I'm wondering how we can do this without losing data.
What kind of data are you talking about here? If you delete a metastore, you will lose all the metadata (unless that migration tool works), but if your raw data is in an external location, you wouldn't lose that.
Thursday
You will need to backup the current metastore including the metadata and then start recreating the catalogs, schemas and tables on the new metastore.
Thursday
The default storage account is not accessible. How can we back it up?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group