How to use Databricks Unity Catalog as metastore for a local spark session
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 11:42 PM
Hello,
I would like to access Databricks Unity Catalog from a Spark session created outside the Databricks environment. Previously, I used Hive metastore and didn’t face any issues connecting in this way. Now, I’ve switched the metastore to Unity Catalog and want to connect it similarly to a local Spark session as the metastore.
The Unity Catalog documentation includes some guidance on this, and the following configuration was shared:
However, I’m not sure how to adapt this configuration for Databricks Unity Catalog. I would appreciate your assistance on this matter.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2024 08:16 AM
To connect to Databricks Unity Catalog from a local Spark session, you'll need to configure your Spark session with the appropriate dependencies, extensions, and authentication. Here's a general setup:
- Install the required libraries, such as io.delta:delta-spark_2.12:3.2.1 and io.unitycatalog:unitycatalog-spark_2.12:0.2.0. You'll need to check for the right compatible versions, try looking into your cluster's /databricks/jars or Drivers Classpath for matching library versions.
- Configure your Spark session using the following settings:
- --name "local-uc-test"
--master "local[*]"
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension"
--conf "spark.sql.catalog.spark_catalog=io.unitycatalog.spark.UCSingleCatalog"
--conf "spark.sql.catalog.unity.uri=<YOUR_UNITY_CATALOG_ENDPOINT>"
--conf "spark.sql.catalog.unity.token=<YOUR_ACCESS_TOKEN>"
--conf "spark.sql.defaultCatalog=unity" - For authentication, use a valid personal access token for spark.sql.catalog.unity.token. You can generate one in your Databricks workspace with appropriate permissions (Settings -> Developer -> Generate New Token).
- Once the Spark session starts, validate the setup by running SHOW CATALOGS in a Spark SQL query to confirm access to Unity Catalog.
If any issues arise, ensure network connectivity to the Unity Catalog endpoint and verify that your environment has the necessary configurations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2024 04:09 PM - edited 12-06-2024 04:09 PM
Thank you for replying my quesiton @VZLA , it is super helpful. I have a small question:
Does the UNITY_CATALOG_ENDPOINT something like below or should I create an endpoint in UC settings
https://<<account-name>>.cloud.databricks.com/api/2.1/unity-catalog
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2024 01:22 AM
@furkancelik Glad it helps.
I just found this article which I believe will clarify many of your doubts. Please refer straight to the "Accessing Databricks UC from the PySpark shell". Notice the "unity" in the configuration strings will be your UC Default Catalog.
To answer your question, the UNITY_CATALOG_ENDPOINT should be in the format:
https://<account-name>.cloud.databricks.com/api/2.1/unity-catalog
You do not need to create an endpoint in Unity Catalog settings; there is one available by default that you can try out first.

