Databricks Community

bhanu_dp · ‎03-11-2025

I am logged into a workspace and trying to check the schemas in legacy hive_metastore using a serverless compute, I can see the schemas listed.

However, when I am creating all-purpose cluster and trying to check the schemas in legacy hive_metastore. I don't see the schemas listed. If I need to configure the cluster with metastore details, which parameters do I need to configure.

I am not sure to which database the hive_metastore was configured.

Is there a way to know to which SQL database is the hive_metastore mapped to. This is for Azure Databricks.

mark_ott · ‎10-24-2025

In Azure Databricks, the visibility difference you observe between Serverless SQL and All-Purpose Clusters when listing schemas in the hive_metastore is due to cluster-level configuration and how each environment connects to the underlying metastore. Serverless SQL is usually preconfigured to access the workspace’s managed Hive metastore, while all-purpose clusters require explicit configuration if a custom or external metastore is used.

Configuring All-Purpose Clusters to Access the Hive Metastore

To connect an all-purpose cluster to the same Hive metastore used by serverless compute, you must configure Spark and Hive parameters in the cluster’s Advanced Options > Spark Config section or use an init script.

Here are the required configurations:

Spark Configuration Options

text

spark.sql.hive.metastore.version <hive-version> 
spark.sql.hive.metastore.jars builtin

For most workspaces:

Use Hive 2.3.9 for Databricks Runtime 10.0 and above.
Use builtin for the spark.sql.hive.metastore.jars value if your metastore version is supported directly in Databricks Runtime.

Hive (Metastore Connection) Options

You must define the JDBC connection details for the external Hive metastore backing database (typically Azure SQL Database):

text

spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://<server-name>.database.windows.net:1433;database=<database-name>
spark.hadoop.javax.jdo.option.ConnectionUserName <username>
spark.hadoop.javax.jdo.option.ConnectionPassword <password>
spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver

These properties tell the cluster where the Hive metastore database is hosted (Azure SQL, MySQL, etc.), along with credentials and driver information.

Optional Verification Config

To ensure schema compatibility and protection:

text

spark.hadoop.hive.metastore.schema.verification true
spark.hadoop.hive.metastore.schema.verification.record.version true

Discovering Which SQL Database the Hive Metastore Uses

If you’re unsure which database is backing your hive_metastore, there are a few ways to determine it:

Check Workspace or Cluster Configuration
Look under your cluster’s Spark Config or Init Script settings. If the metastore parameters shown above are set, they include the JDBC URL pointing to the SQL database being used.
Check Azure Databricks Workspace Deployment Settings
For workspaces using a shared external metastore, the JDBC connection is often configured at deployment time and visible to the workspace admin in Azure Databricks Admin Console > Metastore Settings.
Microsoft Purview Integration
If your environment integrates with Microsoft Purview, you can identify the Hive Metastore’s storage and lineage mapping through Purview’s Databricks scan results, where the Hive source’s database connection is listed.
Test Connectivity
You can verify network access to the potential metastore with a notebook command:

text

%sh nc -vz <SQL-server-name>.database.windows.net 1433

This confirms whether your cluster can reach the SQL server that might be hosting your metastore.

Summary

Serverless compute in Databricks uses a prelinked managed Hive Metastore; for custom clusters, these connections must be manually configured via:

spark.hadoop.javax.jdo.option.ConnectionURL
spark.hadoop.javax.jdo.option.ConnectionUserName
spark.hadoop.javax.jdo.option.ConnectionPassword
spark.hadoop.javax.jdo.option.ConnectionDriverName
spark.sql.hive.metastore.version
spark.sql.hive.metastore.jars builtin

If the metastore is internal (workspace default), these parameters are auto-managed. However, if the Hive metastore is external (e.g., in Azure SQL Database or MySQL), explicit configuration is required for any custom or all-purpose cluster to view its schemas.