cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Accessing data from a legacy hive metastore workspace on a new Unity Catalog workspace

hossein_kolahdo
New Contributor II

Hello,

For the purposes of testing I'm interested in creating a new workspace with Unity Catalog enabled, and from there I'd like to access (external - S3) tables on an existing legacy hive metastore workspace (not UC enabled). The goal is for both workspaces would point to the same underlying S3 external location.

As a requirement I do not want to duplicate data & ideally updates to data on the legacy workspace would be reflected to tables surfaced through UC.

I was considering the possibility of shallow cloning, however from my understanding that is not possible across UC & hive metastore.

Does anybody have experience/recommendations on doing this? Looking through some databricks documentation I'm mostly finding information on upgrading a legacy workspace only.

#unitycatalog #hivemetastore 

3 REPLIES 3

hossein_kolahdo
New Contributor II

@Kaniz From looking at the documentation none address my particular use case which I illustrated (2 workspaces on one account, 1 with UC and the other not). Was there a particular part on any of the docs you're suggesting can help here?

MichTalebzadeh
Contributor

Your aim is to access  external S3 tables from a Unity Catalog workspace without data duplication and keeping data updates synchronized. Configure external location permissions. This ensure that both your Unity Catalog and Hive metastore workspaces have read permissions for the S3 location containing your tables. This allows both workspaces to access the same underlying data without duplication. Then create external tables in Unity Catalog with 'CREATE EXTERNAL TABLE ..' syntax, specifying the S3 location and schema of the existing table. This creates pointers to the existing data in S3 without copying it. Remember whether Hive or otherwise, external tables are just pointers, often used for ETL by overwriting the existing data (in your case on S3). Both Hive an Unity Catalog control the schema and point to data location but do not control the data itself. You can then access the data from both Hive and Unity Catalog.



HTH

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.