cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How-to migrate catalog storage account to another storage account

Marco37
Contributor III

Hi,

we  have a Azure Databricks Workspace that uses Unity Catalog  for storing data. We use a seperate storage account to store the catalogs. We need to enable the option "Infrastructure encryption" on this storage account and this is unfortunatly only possible during creation of a storage account.

Our plan is:

  1. create a new storage account
  2. stop all compute clusters and disable jobs (it is a very small environment)
  3. copy the data to that new storage account with Azure storage Explorer (copy blob container)
  4. Re-create the current storage account with the same name
  5. copy the data back to this storage account with Azure storage Explorer (copy blob container)
  6. enable jobs

I did noticed that the file date of the files on the storage account do change to the current date after the copy activity. Is this a problem for databricks, or is this file date not used because the lineage is stored in databricks tables?

I'm a infra person and not a data engineer.

Regards Marco

5 REPLIES 5

KrisJohannesen
Contributor

Hi Marco

Are you using managed or external tables?

This should be easily shown either in Unity Catalog, or by checking in your storage account if the folders are ID based or have actual table names (eg dim_customer or something).

either way I would move the data to the temporary storage layer using a deep clone. This ensures you keep all the internal references such as the delta log in Unity catalog - those will break if you simply copy the files in storage directly. 
since the solution is not that big the cost wonโ€™t be significant

https://docs.databricks.com/aws/en/delta/clone

main difference is the cleanup. For managed tables you can simply drop the tables/schemas in UC - for external tables you need to go and physically delete the files in the storage account 

lukaszmaron
Visitor

Hi Marco,

haha I actually went through similar case ๐Ÿ™‚

if your UC is Databricks managed - it won't work the way you think. Copying blob or containers will be waste of time as far as  I know. Or the other way - Databricks won't recognize the data inside automatically. In Databricks managed Unity Catalogs have "directories" like:

<container>/
โ””โ”€โ”€ __unitystorage/
    โ””โ”€โ”€ catalogs/
        โ””โ”€โ”€ <catalog-uuid>/
            โ””โ”€โ”€ schemas/
                โ””โ”€โ”€ <schema-uuid>/
                    โ””โ”€โ”€ tables/
                        โ””โ”€โ”€ <table-uuid>/
                            โ”œโ”€โ”€ _delta_log/
                            โ””โ”€โ”€ *.parquet

Catalogs and schemas are actualy UUID based, this is the problematic part. So when you create external location pointing to UC root catalog, it won't recognize it, it will skip it, it will look like it's empty basically...

This means that you will need to load this data using a pipeline to load it using WASBS or UC enabled external location back into "fresh managed catalog".

I was actually working for a tool to solve this specific issue - to copy this data nicely... if you'd like I'm happy to share it with you via GH in example. I tested it locally, but I would be more than happy to have someone test it out in practice. It's Deep Clone based too.

If the catalog is managed outside of Databricks, then it's a bit easier I'd say.

Let me know what if you have any questions or need support

Marco37
Contributor III

Thanks for the replies ๐Ÿ™‚

We are using both managed and external tables.

Our environment is build up with 3 storage accounts:

  • ...wedls. This is the storage account on which I need to enable "Infrastructure encryption".
  • ...managedsa. This is the default storage account that Databricks creates during deployment. We do not use this storage account to store data.
  • ...wemetadls. No idea why we this one, because it seems to be empty.

Schermafbeelding 2026-05-12 113702.png

Is the plan below a plan that should work?

  1. create storage account B with private endpoints
  2. grant access connector permissions on storage account B
  3. create new catalogs with a different name on storage account B
  4. deep clone the tables from storage account A to storage account B
  5. remove the catalogs from storage account A
  6. delete storage account A
  7. recreate the storage account A with the same name
  8. grant access connector permissions on storage account A
  9. create the catalogs on storage account A with the names I want to use
  10. deep clone the tables from storage account B to storage account A
  11. remove the catalogs from storage account B
  12. delete storage account B

Regards,

Marco

@Marco37 yea that sounds like the plan I would do.
... and then of course remember to pause any jobs/workflows and stuff on top before you get going - and restart it once you are done!

lukaszmaron
Visitor

@Marco37 yes overall sound good.

Just remember some points:

  • Delta Lake transaction log history won't be copied (basically querying to a version of won't work)
  • CDC history will be lost
  • Lineage can be lost after 'catalog swap'
  • RLS/column masking stuff can't be deep cloned automatically, needs some preqres
  • streaming tables are not supported
  • Deep clone is DELTA Lake operation, won't work for Parquet, CSV, JSON
  • + others limitations etc

I'm happy to discuss more details, actually I'm really interested in migration topics like this ๐Ÿ˜„

I guess the each container you show represents a catalog?