Unity Catalog read issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2024 01:38 AM
Hello,
Our company is POCing the Unity Catalog with Azure as provider.
We have 2 subscriptions that contains 1 databricks each and 1 ADLS GEN2 each.
Initially we have the default `hive_metastore` connected to the ADLS GEN2. I've created a secret scope and inside the SQL/Spark configuration, I've added all the information about the authentification for the storage account. This is working, we have schema/tables that are on this Azure Storage Account.
With Unity catalog we wish to share data between each Databricks. So we created a databricks connector and a new container inside the same storage account. So we will have to migrate the data from the first container of `hive_metastore` to the new container created for databricks connector.
We were able to create a catalog, and with the UI I was able to create a new schema inside this catalog. I was able to create a table and insert data (even if it seems pretty long to insert 3-4 lines)
Now I go on the other databricks from the other subscription, I've made the configuration to access the catalog. But when making a select, the query does not finish nor fail. It is stuck in `running`.
In my workspace A and B, I'm connected with the same account, and I'm owner of the schema/table. So it's not a role permission.
In workspace A I have spark configuration to access storage account of Subscription A
In workspace B I have spark configuration to access storage account of Subscription B, but I don't have the configuration to access the storage account of Subscription A. Could this be the issue? If yes, then I could have create the schema/table direclty in the hive_metastore without unity calatog. So I guess it's not this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2024 02:26 AM
I think we need to make private endpoints between Databricks A and Storage B so that the catalog from B is accessible from Databricks A.
But in our case, it is not possible, because our security will not allow such architecture.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2024 02:27 AM
where is the UC catalog container sitting ? Sub A or Sub B ? As long as access connector have RBAC at storage and external location is defined in UC, you should be able to connect. Please review.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2024 02:33 AM - edited 06-06-2024 02:56 AM
Hello,
The UC catalog container is sitting in Sub B.
Basically we have a Spoke and Hub configuration for each Subscriptions. Each Subscriptions can access to any ressource inside it own subscription with some PE. But to access other Subscription ressource, we have to ask an autorisation, and the flow with go from A to B passing by a HUB.
In our casse, A is a DEV environment with DEV data. And B is a production environment with PRD data.
Today we copy data from Storage B to Storage A passing with ADF and a shir, the shir has acces to A and B. But this process is long, and very costly.
We need PRD dans un DEV env, so that data scientist work with real data and not sample data. They will never have Write access in PRD catalog/database/tables.
This is why we though we could share data from PRD to DEV with unity catalog. Also we do not want to use Delta Sharing since it uses Internet.

