cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unexpected Schema ID Folder Creation in Unity Catalog External Location

Sunil_Poluri
New Contributor

I've set up Unity Catalog with an external location pointing to a storage account. For each schema, Iโ€™ve configured a dedicated container path. For example:

abfss://schemas@<storage_account>.dfs.core.windows.net/_unityStorage/schemas/<schema_id>

When I create a schema, a schema_id is generated. I expect this schema_id to be reflected as a folder under the schema container path, like:

/_unityStorage/schemas/<schema_id>

However, Iโ€™ve noticed that this folder doesnโ€™t appear immediatelyโ€”presumably because no objects (like tables) exist yet.

Hereโ€™s what Iโ€™ve observed:

  • When I create a Delta table within the schema, I expect the table data to be stored under the schemaโ€™s storage path.
  • Similarly, when I create a DLT pipeline targeting the same schema, I expect the tables to be stored under the same schema path.
  • But instead, a new schema ID folder gets created in the storage account under the schema containerโ€”even though the schema name is the same.

My question is: Under what conditions does Unity Catalog generate a new schema_id folder in the storage account, even when the schema name hasnโ€™t changed?

Any insights or documentation references would be greatly appreciated!

1 REPLY 1

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @Sunil_Poluri , I did some research (learned a few things) and here is what I found. 

Unity Catalog manages cloud storage mapping for schemas using internal IDs (schema_id) to ensure data isolation, governance, and uniqueness within a metastoreโ€”even if schema names are the same across catalogs or across time. Here is a summary of the key factors that influence when new schema_id folders are created under an external location, even if the schema name hasnโ€™t changed:

1. Schema Drop and Re-create

Behavior: Unity Catalog assigns a unique internal identifier (schema_id) to every schema when it is created.
If a schema is dropped and re-createdโ€”even if the name is identicalโ€”a new schema_id (and thus a new folder) is generated. Old object data persists in the previous folder, but new objects (managed tables) will write to the new schema_id directory.
Implication: This is the most common reason for seeing multiple schema_id folders for a schema name.

 

2. Publishing Tables via DLT or Pipelines

When using Databricks Delta Live Tables (DLT) pipelines, table storage always adheres to the current mapping of the schemaโ€™s internal ID. If a pipeline (or notebook) triggers creation of a schema that doesnโ€™t yet exist (for example, by referencing it as a target), Unity Catalog creates a new schema and assigns a new schema_id.
If there was a schema deletion and subsequent re-creation outside your awareness (or automation runs at unpredictable times), this could result in the schema_id shifting even if the schema name appears constant.

3. Direct Versus Indirect Schema Creation Channels

Databricks workflows, DLT, Databricks Asset Bundles, and manual UI actions all use the same underlying APIs, but automation (for example, CI/CD-driven schema creation in Asset Bundles or infrastructure-as-code) can lead to unintentional dropping and re-creating of schemas under the hood, causing new IDs to be assigned.
Mistakenly running schema creation logic without โ€œIF NOT EXISTSโ€ checks may inadvertently replace schemas and (re)generate schema_id folders.

4. Backing Storage or Location Changes

Changing the storage root location property on the schema or re-registering it can also be a scenario where a new schema_id is minted. However, most documentation and troubleshooting guidance emphasize schema drops and re-creations (planned or accidental) as primary drivers.

5. Multiple Metastores or Region/Workspace Boundaries

If running with multiple metastores or cross-region/catalog patterns, schemas with the same name in different metastores are always mapped to distinct internal IDs and thus distinct folders.

6. No Object, No Folder Until First Table

As noted, the schema_id folder is not created in the underlying storage until a managed object (such as a Delta table) is created within the schema. This lazy provisioning is expected behavior for storage efficiency.

Important Additional Notes

The internal IDs are not exposed in user-facing controls; only the folder names in storage and some low-level APIs reveal them. Schema_id changes are not triggered by table creation alone unless the schema itself is new (i.e., it did not exist at the time of table creation).
If you see unexpected new schema_id folders, audit logs, schema version histories, or CI/CD system activity may provide clues (look for drop/create activity).

Hope this helps with your understanding.

Cheers, Louis.