cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unity Catalog read issue

Etyr
Contributor

Hello,

Our company is POCing the Unity Catalog with Azure as provider.

We have 2 subscriptions that contains 1 databricks each and 1 ADLS GEN2 each.

Initially we have the default `hive_metastore` connected to the ADLS GEN2. I've created a secret scope and inside the SQL/Spark configuration, I've added all the information about the authentification for the storage account. This is working, we have schema/tables that are on this Azure Storage Account.

With Unity catalog we wish to share data between each Databricks. So we created a databricks connector and a new container inside the same storage account. So we will have to migrate the data from the first container of `hive_metastore` to the new container created for databricks connector.

We were able to create a catalog, and with the UI I was able to create a new schema inside this catalog. I was able to create a table and insert data (even if it seems pretty long to insert 3-4 lines)

Now I go on the other databricks from the other subscription, I've made the configuration to access the catalog. But when making a select, the query does not finish nor fail. It is stuck in `running`. 

In my workspace A and B, I'm connected with the same account, and I'm owner of the schema/table. So it's not a role permission.

In workspace A I have spark configuration to access storage account of Subscription A

In workspace B I have spark configuration to access storage account of Subscription B, but I don't have the configuration to access the storage account of Subscription A. Could this be the issue? If yes, then I could have create the schema/table direclty in the hive_metastore without unity calatog. So I guess it's not this.

Etyr_0-1717660060863.pngEtyr_1-1717660177530.png

 

Etyr_2-1717663031618.png

 

 

3 REPLIES 3

Etyr
Contributor

I think we need to make private endpoints between Databricks A and Storage B so that the catalog from B is accessible from Databricks A.

But in our case, it is not possible, because our security will not allow such architecture.

chinumari
New Contributor II

where is the UC catalog container sitting ?  Sub A or Sub B ?  As long as access connector have RBAC at storage and external location is defined in UC, you should be able to connect.  Please review. 

Etyr
Contributor

Hello,

The UC catalog container is sitting in Sub B.

Basically we have a Spoke and Hub configuration for each Subscriptions. Each Subscriptions can access to any ressource inside it own subscription with some PE. But to access other Subscription ressource, we have to ask an autorisation, and the flow with go from A to B passing by a HUB.

In our casse, A is a DEV environment with DEV data. And B is a production environment with PRD data.

Today we copy data from Storage B to Storage A passing with ADF and a shir, the shir has acces to A and B. But this process is long, and very costly. 
We need PRD dans un DEV env, so that data scientist work with real data and not sample data. They will never have Write access in PRD catalog/database/tables.

This is why we though we could share data from PRD to DEV with unity catalog. Also we do not want to use Delta Sharing since it uses Internet.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group