Databricks Community

IONA · ‎09-13-2025

Hi

We have a couple of unity catalogs. In each are schema's, some used, some old, same named temp_data etc. etc Within those schema are tables with the same erratic approach. The result of a couple of years of bad housekeeping. (who not experienced that before?)

We now have an initiative to tidy all this up as we merge a couple of departments.

The plan is to create a couple of new catalogs and new schemas within those, then cherry pick the good stuff from the old mess. The first stage of this operation has been to identify the tables we want to keep and set out what the new schemas will be called.

The big question in front of us is just how to best go about porting the old stuff to the new destination. As described its not just a simple 1:1 lift and shift, table "xxx" may well become table "abc" and live in a different schema that which was its previous home.

The data in the old schema is live and will be continued to be updated while the new catalogs are created. There is a requirement to keep the old and new in sync. Maybe not instant synchronization like database replication might have, but perhaps the new is brought up to date by a refresh every day - or whenever we instruct it to do so!

By now it will probably be obvious to you that at some point all the notebooks and reports that use the current data will be switched to point at the new locations. Maybe this will be achieved by a migration branch of the code in git having all the reference paths to point to the new tidy catalogs, and, at the requisite point in time to "go go go" the new tables will be brough up to date and we'll pull the migration branch into the master and the new world will be born. (The old catalogs will rust and die eventually to be deleted)

So. I'm seeking advice on a best strategy to do all this. I could embark on a roll my own method by scripting out tables, tweaking the scripts to the new names and creating the new tables. Then producing a set of sql statements that first truncate the new tables and select from old and inset in to new in order to refresh the data.

Although - with care - this would work but it feels clunky and error prone. So...my big question and pivotal point of learning is...

Is there a more elegant and "Databricks approved/advised" approach to take?

Something where I would set a configuration file/table to instruct;

"Databricks.. please replicate this table from here to here, and keep the two in sync daily at this time. until I instruct you to stop"

Please help before I roll my own solution and then find out there was a better way all along!

Louis_Frolio · ‎09-14-2025

For moving tables and entire schemas consider looking at the "SYNC" command.

The SYNC Command - Your Primary Migration Tool

Databricks provides a SYNC command specifically designed for this exact use case - migrating tables from one location to another while maintaining synchronization . This command can upgrade tables from Hive Metastore to Unity Catalog and can be used for ongoing synchronization between catalogs .

Key Features of SYNC:

• Schema-level or table-level operations: You can sync entire schemas or individual tables

• Automated synchronization: Once synced, the command maintains references between source and target tables

• Dry run capability: Test migrations before executing with `DRY RUN` parameter

• Flexible naming: Tables can be renamed during the sync process

Example Usage:

-- Sync individual table with new name/location
SYNC TABLE new_catalog.new_schema.abc FROM old_catalog.old_schema.xxx;

-- Sync entire schema
SYNC SCHEMA new_catalog.tidy_schema FROM old_catalog.messy_schema;

-- Test before executing
SYNC TABLE new_catalog.new_schema.abc FROM old_catalog.old_schema.xxx DRY RUN;

There are also some automated refresh strategies you can consider:

1. Materialized Views with Scheduled Refresh

2. Lakeflow Streaming Tables

3. Synched Tables (currently in Preview Mode)

Some helpful links:

https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-aux-sync

https://community.databricks.com/t5/technical-blog/planning-a-migration-to-unity-catalog/ba-p/50759

https://www.databricks.com/glossary/materialized-views

https://docs.databricks.com/aws/en/dlt/dbsql/materialized

https://docs.databricks.com/aws/en/dlt/streaming-tables

https://docs.databricks.com/aws/en/dlt/flows

These recommendations are just that so depending on the circumstances of your situation some of them may not apply. However, the goal here is to let you know that there are options and you will need to dig in to see which fits your situation.

Hope this helps, Louis.

Databricks Community

Move/Merge Catalogs

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐