cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it possible to query databricks unity catalog/glue catalog delta tables from presto?

ashish577
New Contributor III

Hi,

I am trying to setup presto in aws emr to be able to read delta tables. For this, as per my understanding, I need to setup the delta.properties file. Does anybody know how do I find the hive.metastore.uri that is needed for this file? For either unity catalog or glue catalog being used in databricks? I am already using glue as catalog for presto, but unable to query delta tables as it throws the error - "Not Hive Table"

3 REPLIES 3

Anonymous
Not applicable

To find the hive.metastore.uri needed for the delta.properties file, you can follow the below steps:

  1. Login to your AWS Glue console
  2. Click on the "Crawlers" tab and select the Glue crawler that you are using for your Delta tables
  3. In the "Crawler details" section, scroll down to "Data stores" and find the "JDBC URL" field
  4. Copy the value of the "JDBC URL" field and paste it into the hive.metastore.uri field in your delta.properties file

Alternatively, you can also find the hive.metastore.uri in Databricks by following these steps:

  1. Open a Databricks notebook
  2. Run the command spark.conf.get("spark.sql.catalogImplementation") to check which catalog implementation is being used. It should return either "hive" or "in-memory"
  3. If the catalog implementation is "hive", run the command spark.conf.get("hive.metastore.uris") to get the hive.metastore.uri
  4. If the catalog implementation is "in-memory", run the command spark.conf.get("spark.sql.warehouse.dir") to get the hive.metastore.uri which will be in the format dbc:derby:;databaseName=<path-to-warehouse-directory>/metastore_db;create=true

Once you have the hive.metastore.uri, you can use it in the delta.properties file to query Delta tables using Presto.

Hope this helps!

ashish577
New Contributor III

Hi Suteja,

Thanks for answering, I tried running the suggested commands and got "hive" as the catalog implementation but when trying to get "hive.metastore.uris" I get error-imagePlease note, I am using glue as the metastore for Databricks. Also, is there a way to get uri that can be used for unity catalog ? I have tried the jdbc url for the ware house with both https:// and databricks:// but that does not seem to work either.

Also, I cannot seem to find "Data Stores" in glue catalog, in the settings for the glue crawler all I see is Data Sources which has s3 paths to the buckets containing delta tables. Please let me know if I need to be looking at something else.

Thanks

Anonymous
Not applicable

@Ashish Singh​ :

The error "java.util.NoSuchElementException" typically means that the element you are trying to access is not found. In this case, it seems like the configuration option "hive.metastore.uris" is not set or not present in your Hive configuration.

If you are using Glue as the metastore for Databricks, you can get the URI for the metastore by following these steps:

  1. Go to the AWS Glue Console
  2. Click on the Glue Catalog that you are using with Databricks
  3. Click on "Databases" in the left sidebar
  4. Click on the name of the database that contains the Delta tables you want to query
  5. Click on "Tables" in the left sidebar
  6. Click on the name of the Delta table you want to query
  7. In the table details page, you should see the "Storage Descriptor" section
  8. In the "Location" field, you will see the S3 path to the Delta table
  9. Replace the S3 protocol with "glue" and remove the trailing "/" from the S3 path to get the Glue Data Catalog URI.

For example, if your S3 path is "s3://my-bucket/my-table/", the Glue Data Catalog URI would be "glue://my-bucket/my-table".

Regarding Unity Catalog, you can find the JDBC URL for Unity Catalog by following these steps:

  1. Go to the Databricks workspace
  2. Click on the "Clusters" tab in the left sidebar
  3. Click on the name of the cluster that you want to use with Presto
  4. In the "JDBC/ODBC" tab, you will see the JDBC URL for the cluster
  5. Replace the "/default" part of the URL with the name of the Unity Catalog database that contains the Delta table you want to query.

For example, if your JDBC URL is "jdbc:databricks://<databricks-instance>/default;transportMode=http;httpPath=sql/protocolv1/o/0/<databricks-cluster-id>", and your Unity Catalog database is called "my_db", the modified JDBC URL would be "jdbc:databricks://<databricks-instance>/my_db;transportMode=http;httpPath=sql/protocolv1/o/0/<databricks-cluster-id>".

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!