cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to know Legacy Metastore connection to SQL DB (used to store metadata)

bhanu_dp
New Contributor III

I am logged into a workspace and trying to check the schemas in legacy hive_metastore using a serverless compute, I can see the schemas listed.

However, when I am creating all-purpose cluster and trying to check the schemas in legacy hive_metastore. I don't see the schemas listed. If I need to configure the cluster with metastore details, which parameters do I need to configure.

I am not sure to which database the hive_metastore was configured.

Is there a way to know to which SQL database is the hive_metastore mapped to. This is for Azure Databricks.

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

In Azure Databricks, the visibility difference you observe between Serverless SQL and All-Purpose Clusters when listing schemas in the hive_metastore is due to cluster-level configuration and how each environment connects to the underlying metastore. Serverless SQL is usually preconfigured to access the workspaceโ€™s managed Hive metastore, while all-purpose clusters require explicit configuration if a custom or external metastore is used.

Configuring All-Purpose Clusters to Access the Hive Metastore

To connect an all-purpose cluster to the same Hive metastore used by serverless compute, you must configure Spark and Hive parameters in the clusterโ€™s Advanced Options > Spark Config section or use an init script.

Here are the required configurations:

Spark Configuration Options

text
spark.sql.hive.metastore.version <hive-version> spark.sql.hive.metastore.jars builtin

For most workspaces:

  • Use Hive 2.3.9 for Databricks Runtime 10.0 and above.

  • Use builtin for the spark.sql.hive.metastore.jars value if your metastore version is supported directly in Databricks Runtime.โ€‹

Hive (Metastore Connection) Options

You must define the JDBC connection details for the external Hive metastore backing database (typically Azure SQL Database):

text
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://<server-name>.database.windows.net:1433;database=<database-name> spark.hadoop.javax.jdo.option.ConnectionUserName <username> spark.hadoop.javax.jdo.option.ConnectionPassword <password> spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver

These properties tell the cluster where the Hive metastore database is hosted (Azure SQL, MySQL, etc.), along with credentials and driver information.โ€‹

Optional Verification Config

To ensure schema compatibility and protection:

text
spark.hadoop.hive.metastore.schema.verification true spark.hadoop.hive.metastore.schema.verification.record.version true

Discovering Which SQL Database the Hive Metastore Uses

If youโ€™re unsure which database is backing your hive_metastore, there are a few ways to determine it:

  1. Check Workspace or Cluster Configuration
    Look under your clusterโ€™s Spark Config or Init Script settings. If the metastore parameters shown above are set, they include the JDBC URL pointing to the SQL database being used.

  2. Check Azure Databricks Workspace Deployment Settings
    For workspaces using a shared external metastore, the JDBC connection is often configured at deployment time and visible to the workspace admin in Azure Databricks Admin Console > Metastore Settings.

  3. Microsoft Purview Integration
    If your environment integrates with Microsoft Purview, you can identify the Hive Metastoreโ€™s storage and lineage mapping through Purviewโ€™s Databricks scan results, where the Hive sourceโ€™s database connection is listed.โ€‹

  4. Test Connectivity
    You can verify network access to the potential metastore with a notebook command:

    text
    %sh nc -vz <SQL-server-name>.database.windows.net 1433

    This confirms whether your cluster can reach the SQL server that might be hosting your metastore.โ€‹

Summary

Serverless compute in Databricks uses a prelinked managed Hive Metastore; for custom clusters, these connections must be manually configured via:

  • spark.hadoop.javax.jdo.option.ConnectionURL

  • spark.hadoop.javax.jdo.option.ConnectionUserName

  • spark.hadoop.javax.jdo.option.ConnectionPassword

  • spark.hadoop.javax.jdo.option.ConnectionDriverName

  • spark.sql.hive.metastore.version

  • spark.sql.hive.metastore.jars builtin

If the metastore is internal (workspace default), these parameters are auto-managed. However, if the Hive metastore is external (e.g., in Azure SQL Database or MySQL), explicit configuration is required for any custom or all-purpose cluster to view its schemas.