cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use Databricks Unity Catalog as metastore for a local spark session

furkancelik
New Contributor II

Hello,

I would like to access Databricks Unity Catalog from a Spark session created outside the Databricks environment. Previously, I used Hive metastore and didn’t face any issues connecting in this way. Now, I’ve switched the metastore to Unity Catalog and want to connect it similarly to a local Spark session as the metastore.

The Unity Catalog documentation includes some guidance on this, and the following configuration was shared:

 

bin/pyspark --name "local-uc-test" \
--master "local[*]" \ --packages "io.delta:delta-spark_2.12:3.2.1,io.unitycatalog:unitycatalog-spark_2.12:0.2.0" \ --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \ --conf "spark.sql.catalog.spark_catalog=io.unitycatalog.spark.UCSingleCatalog" \ --conf "spark.sql.catalog.unity=io.unitycatalog.spark.UCSingleCatalog" \ --conf "spark.sql.catalog.unity.uri=http://localhost:8080" \ --conf "spark.sql.catalog.unity.token=" \ --conf "spark.sql.defaultCatalog=...

However, I’m not sure how to adapt this configuration for Databricks Unity Catalog. I would appreciate your assistance on this matter.

3 REPLIES 3

VZLA
Databricks Employee
Databricks Employee

To connect to Databricks Unity Catalog from a local Spark session, you'll need to configure your Spark session with the appropriate dependencies, extensions, and authentication. Here's a general setup:

  1. Install the required libraries, such as io.delta:delta-spark_2.12:3.2.1 and io.unitycatalog:unitycatalog-spark_2.12:0.2.0. You'll need to check for the right compatible versions, try looking into your cluster's /databricks/jars or Drivers Classpath for matching library versions.
  2. Configure your Spark session using the following settings:
  3. --name "local-uc-test"
    --master "local[*]"
    --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension"
    --conf "spark.sql.catalog.spark_catalog=io.unitycatalog.spark.UCSingleCatalog"
    --conf "spark.sql.catalog.unity.uri=<YOUR_UNITY_CATALOG_ENDPOINT>"
    --conf "spark.sql.catalog.unity.token=<YOUR_ACCESS_TOKEN>"
    --conf "spark.sql.defaultCatalog=unity"
  4. For authentication, use a valid personal access token for spark.sql.catalog.unity.token. You can generate one in your Databricks workspace with appropriate permissions (Settings -> Developer -> Generate New Token).
  5. Once the Spark session starts, validate the setup by running SHOW CATALOGS in a Spark SQL query to confirm access to Unity Catalog.

If any issues arise, ensure network connectivity to the Unity Catalog endpoint and verify that your environment has the necessary configurations.

furkancelik
New Contributor II

Thank you for replying my quesiton @VZLA , it is super helpful. I have a small question:
Does the UNITY_CATALOG_ENDPOINT something like below or should I create an endpoint in UC settings

 

https://<<account-name>>.cloud.databricks.com/api/2.1/unity-catalog

 

 

VZLA
Databricks Employee
Databricks Employee

@furkancelik Glad it helps.

I just found this article which I believe will clarify many of your doubts. Please refer straight to the "Accessing Databricks UC from the PySpark shell". Notice the "unity" in the configuration strings will be your UC Default Catalog.

To answer your question, the UNITY_CATALOG_ENDPOINT should be in the format:

https://<account-name>.cloud.databricks.com/api/2.1/unity-catalog

You do not need to create an endpoint in Unity Catalog settings; there is one available by default that you can try out first.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group