cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Operations on Unity Catalog take too long

breaka
New Contributor II

Hi!

We are currently PoC-ing Databricks with Unity Catalog on AWS but it seems there are some issues.
Creating a database in an existing (unity) catalog takes over 10 minutes. Creating an external table on top of an existing delta table (CREATE TABLE main.bronze.dummy_table USING DELTA LOCATION 's3://<CLOUD_URI>/dummy_data.delta';) takes also 10+ minutes. The cell's status is stuck at `Performing Hive catalog operation: databaseExists`. Working directly on tables via paths works snappy as expected.

The logs show a lot of `java.sql.SQLNonTransientConnectionException: Could not connect to mdv2llxgl8lou0.ceptxxgorjrc.eu-central-1.rds.amazonaws.com:3306 : Connection reset` errors. However, this doesn't seem to be a network issue, as we can reach this url/port via bash in the web-terminal. The cluster's event log also shows multiple `metastore is down' messages.
Not sure if it's related, but eventhough our unity catalog is configured to store its data in a specific S3 directory (also shown via DESCRIBE CATALOG EXTENDED), this directory is still empty. Our metastore does NOT have a managed storage assigned, so I'm not even sure where it actually stores the database/table metadata!?

Almost forgot: Cluster runs on DBR 13.3

Does anyone have an idea, what could be the issue here?
Thanks!

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @breakaIt appears that you’re encountering some challenges while PoC-ing Databricks with Unity Catalog on AWS.

Let’s break down the issues you’ve described:

  1. Database Creation Delay:

    • Creating a database in an existing Unity Catalog takes over 10 minutes.
    • The cell’s status is stuck at Performing Hive catalog operation: databaseExists.
    • However, working directly on tables via paths works as expected.
  2. External Table Creation Delay:

    • Creating an external table on top of an existing Delta table using the specified location (s3://<CLOUD_URI>/dummy_data.delta) also takes 10+ minutes.
  3. Connection Errors:

    • The logs indicate java.sql.SQLNonTransientConnectionException: Could not connect to mdv2llxgl8lou0.ceptxxgorjrc.eu-central-1.rds.amazonaws.com:3306 : Connection reset.
    • Despite this, you can reach the URL/port via the web terminal, ruling out a network issue.
  4. Empty S3 Directory:

    • Your Unity Catalog is configured to store data in a specific S3 directory, but that directory remains empty.
    • The metastore does not have managed storage assigned, leaving you uncertain about where it stores database/table metadata.

Given these observations, let’s explore potential solutions:

  • Unity Catalog Configuration:

  • Metastore Health:

    • Investigate the multiple “metastore is down” messages in the cluster’s event log. A malfunctioning metastore could impact catalog operations.
    • Consider restarting or verifying the health of the metastore service.
  • Storage Credentials and Locations:

  • Connection Troubleshooting:

    • Investigate the connection reset errors. Double-check the credentials and network settings.
    • Since you can reach the URL/port via the web terminal, consider checking the security group rules and firewall settings.
  • Metadata Storage Location:

    • If your metastore does not have managed storage assigned, it’s essential to determine where it stores metadata.
    • Unity Catalog should handle metadata storage, but understanding its behavior is crucial.

Good luck with your PoC! 🚀

 

breaka
New Contributor II

Hi @Kaniz ,

thank you for your reply!

> Unity Catalog Configuration

We configured the metastore, workspace and catalog to our best knowledge and Databricks' documentation. The DB runtime and AWS itself should be fully supported.

> Metastore H ealth (Consider restarting or verifying the h ealth of the metastore service)

AFAIK, the the error message is related to the legacy HIVE metastore at mdv2llxgl8lou0.ceptxxgorjrc.eu-central-1.rds.amazonaws.com address which is hosted and maintained centrally by Databricks. Nothing we can do here.

> Storage Credentials and Locations

Testing the external location that "should" hold the unity catalog data via the data catalog Web-UI shows: "All Permissions Confirmed. The associated Storage Credential grants permission to perform all necessary operations." We successfully use the same storage creds for external volumes on the same S3 bucket (though, different sub-folder.

> Connection Troubleshooting

I'm not sure how we can set any credentials for the legacy hive metastore. Shouldn't this be fully managed by Databricks (via keystore)?

> Since you can reach the URL/port via the web terminal, consider checking the security group rules and firewall settings

With respect to group rules and firewall, is there a difference between making a network connection via Spark (JVM) and via bash / python if it is the very same VM/Container? I can also sucessfully create a socket with Python or shell (%sh) within a Databricks Notebook.

> Metadata Storage Location

I fully agree that "understanding its behavior is crucial" but apparently I'm missing something here. I just created a catalog and set its storage root to an S3 directory, where the GUI shows that we have full access.

I replied to your potential solutions, I hope this clears things up a bit.

Thanks!

breaka
New Contributor II

PS: Apparently I'm not allowed to use the world H E A L T H (without spaces) in my reply (The message body contains H e a l t h, which is not permitted in this community. Please remove this content before sending your post.)

DB_Paul
Community Manager
Community Manager

This word has now been whitelisted, thank you for the tip!


Head of Community, Databricks
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!