Databricks Community

muaaz · yesterday

Hi everyone,

I am currently working on a migration project from Azure Databricks to GCP Databricks, and I need some guidance from the community on best practices around registering external Delta tables into Unity Catalog.

Currenlty I am doing this but need more optimized and best approach.

CREATE TABLE IF NOT EXISTS catalog.schema.table

USING DELTA

LOCATION 'gs://bucket/tables/UUID';

amirabedhiafi · yesterday

Hello muaaz !

I don't think that's the best pattern to follow and I recommend you to avoid registering tables directly from gs://... paths and instead put the data under a UC external location backed by a storage credential. When I check the doc, the best way is that external table paths should live inside an existing external location and it is better organizing this as one external location per schema for external table use cases.

Best case scenario is also to use managed tables by default unless you specifically need in place access to existing data or direct access from non Databricks tools.

So try :

CREATE TABLE catalog.schema.table
USING DELTA
LOCATION 'gs://my-bucket/external/schema_name/table_name';

with that location living inside a precreated UC external location such as:

CREATE EXTERNAL LOCATION ext_schema_name
URL 'gs://my-bucket/external/schema_name/'
WITH (STORAGE CREDENTIAL gcs_credential);

https://docs.databricks.com/gcp/en/connect/unity-catalog/cloud-storage

Ashwin_DSA · yesterday

Hi @muaaz,

On GCP Databricks, the SQL pattern you are using is fine, but the recommended best practice is to back it with a Unity Catalog external location instead of pointing tables directly at arbitrary gs:// paths.

In practice, that means first creating a storage credential and an external location over the relevant GCS prefix (for example, gs://bucket/tables), granting CREATE EXTERNAL TABLE on that external location, and then registering each table with CREATE TABLE catalogue.schema.table USING DELTA LOCATION 'gs://bucket/tables/UUID_or_table_path' where the path sits under that external location and is dedicated to a single table.

Databricks generally recommends using Unity Catalog managed tables for long-term workloads, so a common migration pattern is to initially register existing Delta data as external tables for a fast cutover, and then, once things are stable, convert those external tables to managed tables so that Unity Catalog controls the storage location, lifecycle, and automatic optimisations going forward.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

muaaz · 3 hours ago

Hi @Ashwin_DSA

Thanks for the reply.

The method you proposed sounds fine, but we are dealing with a very large volume of data around 3 schemas, ~50 tenants, and over 100 tables. Since this data is being migrated from Azure to GCP, we would prefer to avoid writing or managing scripts manually, especially as it may introduce additional validation overhead.

To give more context, the data has already been copied from Azure to GCP, and now we need to register the tables in the catalog.

I would really appreciate it if you could suggest an alternative approach that is more efficient, saves time, and minimizes manual involvement.

Ashwin_DSA · 2 hours ago

Hi @muaaz,

Given how much data you have and the fact that it’s already in GCS, there unfortunately isn’t a built-in "auto-discover and register all Delta tables" button in Unity Catalog. You always need some automation to translate the storage layout into table metadata.

The most time and risk efficient pattern I would recommend is to standardise once on a generic registration job, and then let that job do all the work for all schemas/tenants...

You define a clear directory convention on GCS (for example gs://bucket/<schema>/<tenant>/<table>), configure a single external location over the relevant root, and run one Databricks job/notebook that walks that directory tree, detects every _delta_log, infers the catalog/schema/table name from the path, and issues CREATE TABLE … USING DELTA LOCATION 'gs://…' automatically for each match.

That way you are not hand-writing or manually validating 100+ DDLs. Your only manual input is the base path and a small mapping (for example which UC catalog each source schema should land in), and the rest is repeatable and idempotent.

As an alternative for future migrations where you want to avoid dealing with paths at all, you can keep the source on Azure as the system of record, expose those tables via Delta Sharing, and on GCP Databricks materialize them directly as managed Unity Catalog tables. That shifts the problem from "register many external Delta directories" to a controlled, table-driven replication flow without custom per-table scripts. Appreciate this may not work in all scenarios but worth considering.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

muaaz · 2 hours ago

Alright, thank you so much for your support @Ashwin_DSA . It was really helpful.

I will definitely mark it as "Accepted as Solution." I truly appreciate your valuable time.

Databricks Community

Registering Delta tables from external storage GCS , S3 , Azure Blob in Databricks Unity Catalog

Database Branching in Postgres: Git-Style Workflows with Databricks Lakebase

Agent Bricks | A Pilot to Production Series 🚀

🌟 Community Pulse: Your Weekly Roundup! March 30 – April 05, 2026

Solution Accelerator Series | Understanding Price Transparency Data

How agentic software development will change databases