cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Docs/Info for metastore, artifact blob storage endpoints etc for Azure Databricks

GuyPerson
New Contributor

Hi! This is probably a very newbie type of question and my google skills seems to be lacking but is there any in depth documentation/explanations of the metastore, artifact blob storage, system tables storage, log blob storage and event hubs services which endpoints you need to reach from your Azure Databricks workspace? The fqdn's for the different services are well documented here: IP addresses and domains for Azure Databricks services and assets - Azure Databricks | Microsoft Lea.... But I have some trouble of grasping of what most of them actually do and where they're located and who has access to the services (customer? databricks?). Can someone here point me in the right direction for any documentation?

I understand that artifact blob storage is neccessary to reach in order for the workspace to pull runtimes from a databricks managed storage to the workspace clusters but the other ones are a bit of a mystery.

Is it for example neccessary to reach the hive metastore fqdn (consolidated-australiacentral-prod-metastore.mysql.database.azure.com) when you use Unity Catalog?

And from what I think I can understand, the system tables storage, log blob storage and event hubs services are all located at and managed by Azure Databricks and not at the customers tenant/subscription. What information is sent there?

Sorry for a lot of probably very simple and stupid questions but for some reasons I can't seem to find any documentation about it.

 

1 ACCEPTED SOLUTION

Accepted Solutions

shashank853
Databricks Employee
Databricks Employee

Hi,

You can check below:

System Tables Storage

  • Purpose: System tables storage is used to store system-level metadata and configuration data for the Azure Databricks workspace.
  • Data Stored: This includes metadata related to the Unity Catalog, cluster configurations, job metadata, and other system-level information necessary for the workspace's operation and management.

Metastore

  • Purpose: The metastore is used to store metadata about the data stored in your Azure Databricks workspace.
  • Data Stored: Metadata such as table definitions, schema information, and other structural data.


Artifact Blob Storage

  • Purpose: This storage is necessary for the workspace to pull runtimes from Databricks-managed storage to the workspace clusters.
  • Data Stored: Artifacts such as libraries, JAR files, and other runtime dependencies.


Log Blob Storage

  • Purpose: Used for storing logs generated by the Databricks workspace.
  • Data Stored: Logs related to cluster operations, job executions, and other workspace activities.


Event Hubs

  • Purpose: Used for streaming data and event processing.
  • Data Stored: Event data and streaming logs.

> These services are essential for the operation and management of the Azure Databricks workspace. The data stored in these services is primarily related to the management and operation of the workspace and does not include the customer's actual data.

> The details of the addresses for these are provided here https://learn.microsoft.com/en-us/azure/databricks/resources/ip-domain-region#metastore-etc

View solution in original post

1 REPLY 1

shashank853
Databricks Employee
Databricks Employee

Hi,

You can check below:

System Tables Storage

  • Purpose: System tables storage is used to store system-level metadata and configuration data for the Azure Databricks workspace.
  • Data Stored: This includes metadata related to the Unity Catalog, cluster configurations, job metadata, and other system-level information necessary for the workspace's operation and management.

Metastore

  • Purpose: The metastore is used to store metadata about the data stored in your Azure Databricks workspace.
  • Data Stored: Metadata such as table definitions, schema information, and other structural data.


Artifact Blob Storage

  • Purpose: This storage is necessary for the workspace to pull runtimes from Databricks-managed storage to the workspace clusters.
  • Data Stored: Artifacts such as libraries, JAR files, and other runtime dependencies.


Log Blob Storage

  • Purpose: Used for storing logs generated by the Databricks workspace.
  • Data Stored: Logs related to cluster operations, job executions, and other workspace activities.


Event Hubs

  • Purpose: Used for streaming data and event processing.
  • Data Stored: Event data and streaming logs.

> These services are essential for the operation and management of the Azure Databricks workspace. The data stored in these services is primarily related to the management and operation of the workspace and does not include the customer's actual data.

> The details of the addresses for these are provided here https://learn.microsoft.com/en-us/azure/databricks/resources/ip-domain-region#metastore-etc

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group