In Azure Databricks, the visibility difference you observe between Serverless SQL and All-Purpose Clusters when listing schemas in the hive_metastore is due to cluster-level configuration and how each environment connects to the underlying metastore. Serverless SQL is usually preconfigured to access the workspaceโs managed Hive metastore, while all-purpose clusters require explicit configuration if a custom or external metastore is used.
Configuring All-Purpose Clusters to Access the Hive Metastore
To connect an all-purpose cluster to the same Hive metastore used by serverless compute, you must configure Spark and Hive parameters in the clusterโs Advanced Options > Spark Config section or use an init script.
Here are the required configurations:
Spark Configuration Options
spark.sql.hive.metastore.version <hive-version>
spark.sql.hive.metastore.jars builtin
For most workspaces:
-
Use Hive 2.3.9 for Databricks Runtime 10.0 and above.
-
Use builtin for the spark.sql.hive.metastore.jars value if your metastore version is supported directly in Databricks Runtime.โ
Hive (Metastore Connection) Options
You must define the JDBC connection details for the external Hive metastore backing database (typically Azure SQL Database):
spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://<server-name>.database.windows.net:1433;database=<database-name>
spark.hadoop.javax.jdo.option.ConnectionUserName <username>
spark.hadoop.javax.jdo.option.ConnectionPassword <password>
spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver
These properties tell the cluster where the Hive metastore database is hosted (Azure SQL, MySQL, etc.), along with credentials and driver information.โ
Optional Verification Config
To ensure schema compatibility and protection:
spark.hadoop.hive.metastore.schema.verification true
spark.hadoop.hive.metastore.schema.verification.record.version true
Discovering Which SQL Database the Hive Metastore Uses
If youโre unsure which database is backing your hive_metastore, there are a few ways to determine it:
-
Check Workspace or Cluster Configuration
Look under your clusterโs Spark Config or Init Script settings. If the metastore parameters shown above are set, they include the JDBC URL pointing to the SQL database being used.
-
Check Azure Databricks Workspace Deployment Settings
For workspaces using a shared external metastore, the JDBC connection is often configured at deployment time and visible to the workspace admin in Azure Databricks Admin Console > Metastore Settings.
-
Microsoft Purview Integration
If your environment integrates with Microsoft Purview, you can identify the Hive Metastoreโs storage and lineage mapping through Purviewโs Databricks scan results, where the Hive sourceโs database connection is listed.โ
-
Test Connectivity
You can verify network access to the potential metastore with a notebook command:
%sh
nc -vz <SQL-server-name>.database.windows.net 1433
This confirms whether your cluster can reach the SQL server that might be hosting your metastore.โ
Summary
Serverless compute in Databricks uses a prelinked managed Hive Metastore; for custom clusters, these connections must be manually configured via:
-
spark.hadoop.javax.jdo.option.ConnectionURL
-
spark.hadoop.javax.jdo.option.ConnectionUserName
-
spark.hadoop.javax.jdo.option.ConnectionPassword
-
spark.hadoop.javax.jdo.option.ConnectionDriverName
-
spark.sql.hive.metastore.version
-
spark.sql.hive.metastore.jars builtin
If the metastore is internal (workspace default), these parameters are auto-managed. However, if the Hive metastore is external (e.g., in Azure SQL Database or MySQL), explicit configuration is required for any custom or all-purpose cluster to view its schemas.