cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks with standard private link cluster event log error: "Metastore down"...

m997al
Contributor III

We have Azure Databricks with standard private link (back-end and front-end private link).

We are able to successfully attach a Databricks workspace to the Databricks metastore (ADLS Gen2 storage).

However, when trying to create tables in a catalog in the Databricks metastore, running from a cluster on the Databricks workspace, I run into the following scenario:

  1. Create catalog in metastore (success)
  2. Create schema in metastore (success)
  3. Create table in metastore... intermittent issues!
    1. Sometimes the table can be created (from notebook SQL or csv file upload)...but takes like 6 minutes!
    2. Other times, I get a message of "Metastore down" in the cluster event log.  When I go to look at what might be happening in the driver logs, I see this -->  "Caused by: java.sql.SQLNonTransientConnectionException: Socket fail to connect to host:consolidated-westus2-prod-metastore-addl-2.mysql.database.azure.com, port:3306. connect timed out"
    3. The IP address of consolidated-westus2-prod-metastore-addl-2.mysql.database.azure.com is 13.66.136.192.  That address will definitely be blocked by our internal firewall.

It seems like we are close to getting this to work.  Do we need to allow traffic to that external IP, even with standard private link?  Any ideas on what might be going on?

Thanks!

6 REPLIES 6

daniel_sahal
Esteemed Contributor

@m997al 
You still need to whitelist some of the IPs on your firewall. This can be done through service tags:
https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/udr

m997al
Contributor III

Thanks @daniel_sahal  !  So we are trying to get the full list of what we need to whitelist.  

The Microsoft Azure documentation is a little unclear for what we need specifically, have Azure Databricks standard private link and SCC ("No Public IP" for the clusters).

I did find this:

m997al_0-1713888633730.png

...and those in turn tie to these URLs...

m997al_1-1713888666981.png

... I see some URLs for "Artifact Blob storage secondary" and "System tables storage" that are not referenced in the first list... do we need those too?

Thanks for your help!

daniel_sahal
Esteemed Contributor

@m997al 
Yes, they are needed too.
Basically Service Tag is a bundled list of IPs, so if you're using Azure Firewall, you don't need to put each of one separately, you can just use service tag.
If you're using your own Firewall, then you need to whitelist each of IP provided in documentation.

NOTE: If you want to see which IPs Service Tag contains, here is a full list: https://www.microsoft.com/en-us/download/details.aspx?id=56519

m997al
Contributor III

Great, thank you!

CharlesWoo
New Contributor II

can confirm that the approach will solve your error. Ran into a similar issue a while back.

Thank you! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group