cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Operations on Unity Catalog take too long

breaka
New Contributor III

Hi!

We are currently PoC-ing Databricks with Unity Catalog on AWS but it seems there are some issues.
Creating a database in an existing (unity) catalog takes over 10 minutes. Creating an external table on top of an existing delta table (CREATE TABLE main.bronze.dummy_table USING DELTA LOCATION 's3://<CLOUD_URI>/dummy_data.delta';) takes also 10+ minutes. The cell's status is stuck at `Performing Hive catalog operation: databaseExists`. Working directly on tables via paths works snappy as expected.

The logs show a lot of `java.sql.SQLNonTransientConnectionException: Could not connect to mdv2llxgl8lou0.ceptxxgorjrc.eu-central-1.rds.amazonaws.com:3306 : Connection reset` errors. However, this doesn't seem to be a network issue, as we can reach this url/port via bash in the web-terminal. The cluster's event log also shows multiple `metastore is down' messages.
Not sure if it's related, but eventhough our unity catalog is configured to store its data in a specific S3 directory (also shown via DESCRIBE CATALOG EXTENDED), this directory is still empty. Our metastore does NOT have a managed storage assigned, so I'm not even sure where it actually stores the database/table metadata!?

Almost forgot: Cluster runs on DBR 13.3

Does anyone have an idea, what could be the issue here?
Thanks!

3 REPLIES 3

breaka
New Contributor III

Hi @Retired_mod ,

thank you for your reply!

> Unity Catalog Configuration

We configured the metastore, workspace and catalog to our best knowledge and Databricks' documentation. The DB runtime and AWS itself should be fully supported.

> Metastore H ealth (Consider restarting or verifying the h ealth of the metastore service)

AFAIK, the the error message is related to the legacy HIVE metastore at mdv2llxgl8lou0.ceptxxgorjrc.eu-central-1.rds.amazonaws.com address which is hosted and maintained centrally by Databricks. Nothing we can do here.

> Storage Credentials and Locations

Testing the external location that "should" hold the unity catalog data via the data catalog Web-UI shows: "All Permissions Confirmed. The associated Storage Credential grants permission to perform all necessary operations." We successfully use the same storage creds for external volumes on the same S3 bucket (though, different sub-folder.

> Connection Troubleshooting

I'm not sure how we can set any credentials for the legacy hive metastore. Shouldn't this be fully managed by Databricks (via keystore)?

> Since you can reach the URL/port via the web terminal, consider checking the security group rules and firewall settings

With respect to group rules and firewall, is there a difference between making a network connection via Spark (JVM) and via bash / python if it is the very same VM/Container? I can also sucessfully create a socket with Python or shell (%sh) within a Databricks Notebook.

> Metadata Storage Location

I fully agree that "understanding its behavior is crucial" but apparently I'm missing something here. I just created a catalog and set its storage root to an S3 directory, where the GUI shows that we have full access.

I replied to your potential solutions, I hope this clears things up a bit.

Thanks!

breaka
New Contributor III

PS: Apparently I'm not allowed to use the world H E A L T H (without spaces) in my reply (The message body contains H e a l t h, which is not permitted in this community. Please remove this content before sending your post.)

DB_Paul
Databricks Employee
Databricks Employee

This word has now been whitelisted, thank you for the tip!


Head of Community, Databricks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group