09-10-2024 02:05 PM
Receiving the following error when attempting to run the classroom setup for lesson 1.2 of the Data Engineering with Databricks 3.1.12.
This has been tested with multiple accounts, both admins and non-admins.
Below is the error message I am receiving.
%run ./Includes/Classroom-Setup-01.2
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
My databricks workspace is deployed in Azure with VNET injection.
I believe the issue to be with access to the hive_metastore. If viewing the hive_metastore in the Catalog Explorer, it is unable to see the default schema using compute in my tenant (both SQL Warehouse and All-Purpose Compute) but I can view the default schema when using Serverless SQL Warehouse.
I found the following post with a similar issue and attempted to run the commands suggested without success. https://community.databricks.com/t5/data-engineering/unable-to-instantiate-hive-meta-store-client/td...
%sh sudo service hive-metastore status
Unit hive-metastore.service could not be found.
09-10-2024 02:16 PM
Hi @JR61276126 ,
Since your workspace is deployed in azure with vent injection I assume it might be a network/firewall related issue. Could you check your driver logs also?
09-11-2024 06:17 AM
Hello @szymon_dybczak ,
Looking at the driver logs, it appears to be an issue connecting to consolidated-eastusc3-prod-metastore-0.mysql.database.azure.com.
In researching this endpoint, I found the following document outlining access points for Azure Databricks with that being the endpoint for the Metastore. IP addresses and domains for Azure Databricks services and assets - Azure Databricks | Microsoft Lea...
I understand that to resolve the issue, I need to open access to that endpoint, but I first have a question of why it needs to connect to a MySQL endpoint and what is stored there? We implemented VNET injection because we want to keep our instance private, but if data is being stored outside of our tenant, that is a potential risk.
09-11-2024 06:39 AM
Hi @JR61276126 ,
Yeah, just like I thought. And to answer your second question. Datbricks is hosting hive metastore in MySQL database. So that's why you need to add an outbound connection to it
09-12-2024 07:55 AM
Hi @szymon_dybczak, is there any way around that? We do not plan to use hive_metastore in favor of Unity Catalog, but we need it for the purpose of allowing our staff to go through the Databricks provided learning content.
If we would open this port to allow connections to this endpoint, what type of data is stored there? I would need to be able to justify this change to our security teams and therefore need to better understand the purpose and content within that endpoint.
09-12-2024 08:11 AM
Hi @JR61276126 ,
In metastore they will only store metadata, like columns names and data types. Your actual data are stored on your storage account. If this is only for learning purposes I would say you have nothing to worry about (from security perspective)
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group