cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Databricks with standard private link cluster event log error: "Metastore down"...

m997al
Contributor

We have Azure Databricks with standard private link (back-end and front-end private link).

We are able to successfully attach a Databricks workspace to the Databricks metastore (ADLS Gen2 storage).

However, when trying to create tables in a catalog in the Databricks metastore, running from a cluster on the Databricks workspace, I run into the following scenario:

  1. Create catalog in metastore (success)
  2. Create schema in metastore (success)
  3. Create table in metastore... intermittent issues!
    1. Sometimes the table can be created (from notebook SQL or csv file upload)...but takes like 6 minutes!
    2. Other times, I get a message of "Metastore down" in the cluster event log.  When I go to look at what might be happening in the driver logs, I see this -->  "Caused by: java.sql.SQLNonTransientConnectionException: Socket fail to connect to host:consolidated-westus2-prod-metastore-addl-2.mysql.database.azure.com, port:3306. connect timed out"
    3. The IP address of consolidated-westus2-prod-metastore-addl-2.mysql.database.azure.com is 13.66.136.192.  That address will definitely be blocked by our internal firewall.

It seems like we are close to getting this to work.  Do we need to allow traffic to that external IP, even with standard private link?  Any ideas on what might be going on?

Thanks!

6 REPLIES 6

daniel_sahal
Esteemed Contributor

@m997al 
You still need to whitelist some of the IPs on your firewall. This can be done through service tags:
https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/udr

m997al
Contributor

Thanks @daniel_sahal  !  So we are trying to get the full list of what we need to whitelist.  

The Microsoft Azure documentation is a little unclear for what we need specifically, have Azure Databricks standard private link and SCC ("No Public IP" for the clusters).

I did find this:

m997al_0-1713888633730.png

...and those in turn tie to these URLs...

m997al_1-1713888666981.png

... I see some URLs for "Artifact Blob storage secondary" and "System tables storage" that are not referenced in the first list... do we need those too?

Thanks for your help!

daniel_sahal
Esteemed Contributor

@m997al 
Yes, they are needed too.
Basically Service Tag is a bundled list of IPs, so if you're using Azure Firewall, you don't need to put each of one separately, you can just use service tag.
If you're using your own Firewall, then you need to whitelist each of IP provided in documentation.

NOTE: If you want to see which IPs Service Tag contains, here is a full list: https://www.microsoft.com/en-us/download/details.aspx?id=56519

m997al
Contributor

Great, thank you!

CharlesWoo
New Contributor II

can confirm that the approach will solve your error. Ran into a similar issue a while back.

Thank you! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.