cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

In a new workspace without any data using the Unity Catalog, can I hide/delete the hive_metastore, main, samples, and system catalogs?

MetaRossiVinli
Contributor

I am setting up a new workspace that will use the Unity Catalog. I want all data stored in the Unity Catalog in the following catalogs: dev, staging, prod. I want to prevent users from accidentally reading and writing data elsewhere.

For the above situation, can I hide and/or delete the following default catalogs?

  • hive_metastore
  • main
  • samples
  • system
1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Kevin Rossi​ Unfortunately hive_metastore can't be hidden as of now. It's not needed for UC, but a Databricks workspace doesn't work well without the default RDS connections which require changes in the way DBR/Spark starts up. Eventually we will have a UC-only workspace with no references to HMS, but that doesn't exist today. (Eng is working on it).

Here are the couple of things as. a workaround.

Configure the default catalog from hive_metastore to another catalog using "spark.databricks.sql.initial.catalog.name" property.

Default catalog can also be set while assigning the workspace to a metastore. If it's already assigned, unassign and reassign the workspace with a default catalog.

samples and system catalogs are read only catalogs, they can't be removed.

Regarding main catalog, we have a feature request called catalog to workspace binding. Be default, a catalog is bound to all workspaces, but using this feature we can bind the catalog only to the desired workspaces. In this case. If we disable all workspace access to main catalog, then it won't be visible on all workspaces. Please reach out to your Databricks contact to onboard your account to this feature.

View solution in original post

3 REPLIES 3

Anonymous
Not applicable

@Kevin Rossi​ Unfortunately hive_metastore can't be hidden as of now. It's not needed for UC, but a Databricks workspace doesn't work well without the default RDS connections which require changes in the way DBR/Spark starts up. Eventually we will have a UC-only workspace with no references to HMS, but that doesn't exist today. (Eng is working on it).

Here are the couple of things as. a workaround.

Configure the default catalog from hive_metastore to another catalog using "spark.databricks.sql.initial.catalog.name" property.

Default catalog can also be set while assigning the workspace to a metastore. If it's already assigned, unassign and reassign the workspace with a default catalog.

samples and system catalogs are read only catalogs, they can't be removed.

Regarding main catalog, we have a feature request called catalog to workspace binding. Be default, a catalog is bound to all workspaces, but using this feature we can bind the catalog only to the desired workspaces. In this case. If we disable all workspace access to main catalog, then it won't be visible on all workspaces. Please reach out to your Databricks contact to onboard your account to this feature.

Short answers that I derived from the above:

  • hive_metastore - This cannot be deleted or hidden, but the default catalog can be changed with the above instructions.
  • main - There is a new feature that can unbind catalogs from workspaces. This would remove access as I desire. TODO request our account to be onboarded for this.
  • samples - Read only and cannot be removed.
  • system - Read only and cannot be removed.

OK, cool, thanks. I think that will enable me to effectively govern our users as desired. I am a fan of keeping everything cleanly separated. We are going have two workspaces for our team:

  1. research for demos, dabbling, and testing new Databricks features
  2. prod for production code/notebooks that are vetted through Git PRs and use dev/staging/prod branches

Going forward, I would support features that enable data science teams govern production pipelines in a clean manner. Removing unneeded databases/catalogs and improved management of pipelines would be favorable in my opinion. I think everything that we need to implement this exists now.

Features like the 'catalog to workspace binding' help keep concerns separated; i.e. exposing a research catalog to only our research workspace and preventing access to that catalog in the prod workspace. This feature will prevent us from accidentally writing to the research catalog from a prod pipeline; we will also enforce this with permissions... but I like redundancy.

Avvar2022
Contributor

@Kevin Rossi​ @John Lourdu​  - I am also new to databricks setting up environment.

Bu default "all users" have read access to below mentioned catalogs,

my question is - i see an option to revoke read access, is it must have read access to all these catalogs to "all users". Can i revoke will there be any impact?

  • main - by default "all users" have read access, i see option to revoke access. if i revoke access will there be any impact.
  • samples - by default "all users" have read access, i see option to revoke access. if i revoke access will there be any impact.
  • system - by default "all users" have read access, i see option to revoke access. if i revoke access will there be any impact.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!