Databricks Community

ashish577 · ‎03-21-2023

Hi,

I am trying to setup presto in aws emr to be able to read delta tables. For this, as per my understanding, I need to setup the delta.properties file. Does anybody know how do I find the hive.metastore.uri that is needed for this file? For either unity catalog or glue catalog being used in databricks? I am already using glue as catalog for presto, but unable to query delta tables as it throws the error - "Not Hive Table"

Anonymous · ‎03-25-2023

To find the hive.metastore.uri needed for the delta.properties file, you can follow the below steps:

Login to your AWS Glue console
Click on the "Crawlers" tab and select the Glue crawler that you are using for your Delta tables
In the "Crawler details" section, scroll down to "Data stores" and find the "JDBC URL" field
Copy the value of the "JDBC URL" field and paste it into the hive.metastore.uri field in your delta.properties file

Alternatively, you can also find the hive.metastore.uri in Databricks by following these steps:

Open a Databricks notebook
Run the command spark.conf.get("spark.sql.catalogImplementation") to check which catalog implementation is being used. It should return either "hive" or "in-memory"
If the catalog implementation is "hive", run the command spark.conf.get("hive.metastore.uris") to get the hive.metastore.uri
If the catalog implementation is "in-memory", run the command spark.conf.get("spark.sql.warehouse.dir") to get the hive.metastore.uri which will be in the format dbc:derby:;databaseName=<path-to-warehouse-directory>/metastore_db;create=true

Once you have the hive.metastore.uri, you can use it in the delta.properties file to query Delta tables using Presto.

Hope this helps!

ashish577 · ‎03-25-2023

Hi Suteja,

Thanks for answering, I tried running the suggested commands and got "hive" as the catalog implementation but when trying to get "hive.metastore.uris" I get error-Please note, I am using glue as the metastore for Databricks. Also, is there a way to get uri that can be used for unity catalog ? I have tried the jdbc url for the ware house with both https:// and databricks:// but that does not seem to work either.

Also, I cannot seem to find "Data Stores" in glue catalog, in the settings for the glue crawler all I see is Data Sources which has s3 paths to the buckets containing delta tables. Please let me know if I need to be looking at something else.

Thanks

Anonymous · ‎04-01-2023

@Ashish Singh :

The error "java.util.NoSuchElementException" typically means that the element you are trying to access is not found. In this case, it seems like the configuration option "hive.metastore.uris" is not set or not present in your Hive configuration.

If you are using Glue as the metastore for Databricks, you can get the URI for the metastore by following these steps:

Go to the AWS Glue Console
Click on the Glue Catalog that you are using with Databricks
Click on "Databases" in the left sidebar
Click on the name of the database that contains the Delta tables you want to query
Click on "Tables" in the left sidebar
Click on the name of the Delta table you want to query
In the table details page, you should see the "Storage Descriptor" section
In the "Location" field, you will see the S3 path to the Delta table
Replace the S3 protocol with "glue" and remove the trailing "/" from the S3 path to get the Glue Data Catalog URI.

For example, if your S3 path is "s3://my-bucket/my-table/", the Glue Data Catalog URI would be "glue://my-bucket/my-table".

Regarding Unity Catalog, you can find the JDBC URL for Unity Catalog by following these steps:

Go to the Databricks workspace
Click on the "Clusters" tab in the left sidebar
Click on the name of the cluster that you want to use with Presto
In the "JDBC/ODBC" tab, you will see the JDBC URL for the cluster
Replace the "/default" part of the URL with the name of the Unity Catalog database that contains the Delta table you want to query.

For example, if your JDBC URL is "jdbc:databricks://<databricks-instance>/default;transportMode=http;httpPath=sql/protocolv1/o/0/<databricks-cluster-id>", and your Unity Catalog database is called "my_db", the modified JDBC URL would be "jdbc:databricks://<databricks-instance>/my_db;transportMode=http;httpPath=sql/protocolv1/o/0/<databricks-cluster-id>".

Databricks Community

Is it possible to query databricks unity catalog/glue catalog delta tables from presto?

Connect with Databricks Users in Your Area

Introducing SAP Databricks

Serverless Compute for Notebooks, Workflows and Pipelines is now Generally Available on Google Cloud

Welcoming BladeBridge to Databricks: Accelerating Data Warehouse Migrations to Lakehouse

Databricks Clean Rooms: Now Generally Available on AWS and Azure

Securely share data, analytics and AI