cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

In a new workspace without any data using the Unity Catalog, can I hide/delete the hive_metastore, main, samples, and system catalogs?

MetaRossiVinli
Contributor

I am setting up a new workspace that will use the Unity Catalog. I want all data stored in the Unity Catalog in the following catalogs: dev, staging, prod. I want to prevent users from accidentally reading and writing data elsewhere.

For the above situation, can I hide and/or delete the following default catalogs?

  • hive_metastore
  • main
  • samples
  • system
1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Kevin Rossiโ€‹ Unfortunately hive_metastore can't be hidden as of now. It's not needed for UC, but a Databricks workspace doesn't work well without the default RDS connections which require changes in the way DBR/Spark starts up. Eventually we will have a UC-only workspace with no references to HMS, but that doesn't exist today. (Eng is working on it).

Here are the couple of things as. a workaround.

Configure the default catalog from hive_metastore to another catalog using "spark.databricks.sql.initial.catalog.name" property.

Default catalog can also be set while assigning the workspace to a metastore. If it's already assigned, unassign and reassign the workspace with a default catalog.

samples and system catalogs are read only catalogs, they can't be removed.

Regarding main catalog, we have a feature request called catalog to workspace binding. Be default, a catalog is bound to all workspaces, but using this feature we can bind the catalog only to the desired workspaces. In this case. If we disable all workspace access to main catalog, then it won't be visible on all workspaces. Please reach out to your Databricks contact to onboard your account to this feature.

View solution in original post

3 REPLIES 3

Anonymous
Not applicable

@Kevin Rossiโ€‹ Unfortunately hive_metastore can't be hidden as of now. It's not needed for UC, but a Databricks workspace doesn't work well without the default RDS connections which require changes in the way DBR/Spark starts up. Eventually we will have a UC-only workspace with no references to HMS, but that doesn't exist today. (Eng is working on it).

Here are the couple of things as. a workaround.

Configure the default catalog from hive_metastore to another catalog using "spark.databricks.sql.initial.catalog.name" property.

Default catalog can also be set while assigning the workspace to a metastore. If it's already assigned, unassign and reassign the workspace with a default catalog.

samples and system catalogs are read only catalogs, they can't be removed.

Regarding main catalog, we have a feature request called catalog to workspace binding. Be default, a catalog is bound to all workspaces, but using this feature we can bind the catalog only to the desired workspaces. In this case. If we disable all workspace access to main catalog, then it won't be visible on all workspaces. Please reach out to your Databricks contact to onboard your account to this feature.

Short answers that I derived from the above:

  • hive_metastore - This cannot be deleted or hidden, but the default catalog can be changed with the above instructions.
  • main - There is a new feature that can unbind catalogs from workspaces. This would remove access as I desire. TODO request our account to be onboarded for this.
  • samples - Read only and cannot be removed.
  • system - Read only and cannot be removed.

OK, cool, thanks. I think that will enable me to effectively govern our users as desired. I am a fan of keeping everything cleanly separated. We are going have two workspaces for our team:

  1. research for demos, dabbling, and testing new Databricks features
  2. prod for production code/notebooks that are vetted through Git PRs and use dev/staging/prod branches

Going forward, I would support features that enable data science teams govern production pipelines in a clean manner. Removing unneeded databases/catalogs and improved management of pipelines would be favorable in my opinion. I think everything that we need to implement this exists now.

Features like the 'catalog to workspace binding' help keep concerns separated; i.e. exposing a research catalog to only our research workspace and preventing access to that catalog in the prod workspace. This feature will prevent us from accidentally writing to the research catalog from a prod pipeline; we will also enforce this with permissions... but I like redundancy.

Avvar2022
Contributor

@Kevin Rossiโ€‹ @John Lourduโ€‹  - I am also new to databricks setting up environment.

Bu default "all users" have read access to below mentioned catalogs,

my question is - i see an option to revoke read access, is it must have read access to all these catalogs to "all users". Can i revoke will there be any impact?

  • main - by default "all users" have read access, i see option to revoke access. if i revoke access will there be any impact.
  • samples - by default "all users" have read access, i see option to revoke access. if i revoke access will there be any impact.
  • system - by default "all users" have read access, i see option to revoke access. if i revoke access will there be any impact.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group