Is it possible to query databricks unity catalog/glue catalog delta tables from presto?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-21-2023 09:31 AM
Hi,
I am trying to setup presto in aws emr to be able to read delta tables. For this, as per my understanding, I need to setup the delta.properties file. Does anybody know how do I find the hive.metastore.uri that is needed for this file? For either unity catalog or glue catalog being used in databricks? I am already using glue as catalog for presto, but unable to query delta tables as it throws the error - "Not Hive Table"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-25-2023 12:00 AM
To find the hive.metastore.uri needed for the delta.properties file, you can follow the below steps:
- Login to your AWS Glue console
- Click on the "Crawlers" tab and select the Glue crawler that you are using for your Delta tables
- In the "Crawler details" section, scroll down to "Data stores" and find the "JDBC URL" field
- Copy the value of the "JDBC URL" field and paste it into the hive.metastore.uri field in your delta.properties file
Alternatively, you can also find the hive.metastore.uri in Databricks by following these steps:
- Open a Databricks notebook
- Run the command spark.conf.get("spark.sql.catalogImplementation") to check which catalog implementation is being used. It should return either "hive" or "in-memory"
- If the catalog implementation is "hive", run the command spark.conf.get("hive.metastore.uris") to get the hive.metastore.uri
- If the catalog implementation is "in-memory", run the command spark.conf.get("spark.sql.warehouse.dir") to get the hive.metastore.uri which will be in the format dbc:derby:;databaseName=<path-to-warehouse-directory>/metastore_db;create=true
Once you have the hive.metastore.uri, you can use it in the delta.properties file to query Delta tables using Presto.
Hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-25-2023 01:27 AM
Hi Suteja,
Thanks for answering, I tried running the suggested commands and got "hive" as the catalog implementation but when trying to get "hive.metastore.uris" I get error-Please note, I am using glue as the metastore for Databricks. Also, is there a way to get uri that can be used for unity catalog ? I have tried the jdbc url for the ware house with both https:// and databricks:// but that does not seem to work either.
Also, I cannot seem to find "Data Stores" in glue catalog, in the settings for the glue crawler all I see is Data Sources which has s3 paths to the buckets containing delta tables. Please let me know if I need to be looking at something else.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-01-2023 10:21 PM
@Ashish Singh :
The error "java.util.NoSuchElementException" typically means that the element you are trying to access is not found. In this case, it seems like the configuration option "hive.metastore.uris" is not set or not present in your Hive configuration.
If you are using Glue as the metastore for Databricks, you can get the URI for the metastore by following these steps:
- Go to the AWS Glue Console
- Click on the Glue Catalog that you are using with Databricks
- Click on "Databases" in the left sidebar
- Click on the name of the database that contains the Delta tables you want to query
- Click on "Tables" in the left sidebar
- Click on the name of the Delta table you want to query
- In the table details page, you should see the "Storage Descriptor" section
- In the "Location" field, you will see the S3 path to the Delta table
- Replace the S3 protocol with "glue" and remove the trailing "/" from the S3 path to get the Glue Data Catalog URI.
For example, if your S3 path is "s3://my-bucket/my-table/", the Glue Data Catalog URI would be "glue://my-bucket/my-table".
Regarding Unity Catalog, you can find the JDBC URL for Unity Catalog by following these steps:
- Go to the Databricks workspace
- Click on the "Clusters" tab in the left sidebar
- Click on the name of the cluster that you want to use with Presto
- In the "JDBC/ODBC" tab, you will see the JDBC URL for the cluster
- Replace the "/default" part of the URL with the name of the Unity Catalog database that contains the Delta table you want to query.
For example, if your JDBC URL is "jdbc:databricks://<databricks-instance>/default;transportMode=http;httpPath=sql/protocolv1/o/0/<databricks-cluster-id>", and your Unity Catalog database is called "my_db", the modified JDBC URL would be "jdbc:databricks://<databricks-instance>/my_db;transportMode=http;httpPath=sql/protocolv1/o/0/<databricks-cluster-id>".