cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Migrating external tables to managed tables from HMS to UC

yanchr
New Contributor II

I think the easiest way to do that is to use DEEP CLONE. However, since the SET MANAGED approach was introduced in DBR 17, wouldn't it be better to first migrate the table as external and then convert it to managed using SET MANAGED? The Databricks article indicates that it shows great performance. What is the best practice?

2 REPLIES 2

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

Where SET MANAGED is supported, it has replaced DEEP CLONE as the primary migration path for UC Delta tables. DEEP CLONE is now more of a fallback tool.

Below are 3 options.

  1. UC external โ†’ UC managed with ALTER TABLE โ€ฆ SET MANAGED (DBR 17+) โ€“ preferred

    • Databricks explicitly recommends SET MANAGED over CTAS/DEEP CLONE for converting UC external tables to managed because it:
      • Preserves full table history.
      • Minimizes downtime using a two-phase background copy (typical writer downtime ~1โ€“5 minutes, readers often no downtime).
      • Keeps the same table name, permissions, comments, configs, and adds path-based redirect for legacy code.
    • Best practice: If you can get your table into UC external Delta, then run SET MANAGED. That is the โ€œeasy buttonโ€.
  2. UC managed โ†’ UC external (same location) โ†’ SET MANAGED in new managed location โ€“ advanced but strongest semantics

    • For moving existing managed tables to a new managed location while preserving full history and supporting concurrent writes with ~1โ€“2 min downtime, the pattern of managed โ†’ external โ†’ ALTER TABLE โ€ฆ SET MANAGED is an alternative to pure DEEP CLONE, assuming DBR 17+ and the catalog already has the desired managed location configured.
    • This is more operationally complex than a straight DEEP CLONE, but you get the SET MANAGED benefits (history, minimal downtime, no namespace change).
  3. DEEP CLONE โ€“ still useful in some cases

    • Still a good choice when:
      • Youโ€™re changing format (e.g., Parquet โ†’ Delta) or coming from non-UC / foreign sources where SET MANAGED doesnโ€™t apply.
      • You want a one-off copy (e.g., DR dry runs, sandboxing, moving between catalogs/metastores where you actually want a new table and donโ€™t need history continuity).
    • Trade-offs: duplicates data, does not preserve version history across the move, and generally has more visible downtime than SET MANAGED in the externalโ†’managed scenarios.

Recommendation:
Given youโ€™re on DBR 17+ and talking about tables that can be represented as UC external Delta, the current best practice is:

  • Prefer a SET-MANAGEDโ€“based path over a pure DEEP CLONE, especially when you care about:
    • preserving table history,
    • minimizing writer/reader downtime, and
    • avoiding duplicate data and code changes.

So you are right: โ€œfirst migrate as external, then SET MANAGED to managedโ€ is now the better option than relying solely on DEEP CLONE, and DEEP CLONE should be reserved for cases where SET MANAGED simply doesnโ€™t apply or you intentionally want a fresh copy.

amirabedhiafi
New Contributor III

Hello @yanchr !

Yes if the table can first be registered as a UC external table, then ALTER TABLE ... SET MANAGED is now the better practice than doing a direct DEEP CLONE especially on DBR 17+ / serverless.

It is recommended to use SET MANAGED for converting UC external tables to managed tables because it minimizes reader or writer downtime, handles concurrent writes, keeps the same table name/configuration/permissions/views, retains table history and supports rollback to external within 14 days.  It also copies data in two phases: first with no downtime, then a short switch step which is why downtime is usually lower than DEEP CLONE. 

So I would use this rule ๐Ÿ™‚

You can use direct DEEP CLONE mainly when you cannot use SYNC/external registration for example hive managed tables stored in workspace storage/DBFS root or cases where you intentionally want a new independent managed copy.

If this answer resolves your question, could you please mark it as โ€œAccept as Solutionโ€? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP