Re: Where in Hive Metastore can the s3 locations o...

Nino · ‎08-11-2023

Hey @Mo, thanks for the input!

Yes, points 1 & 2 are correct 🙂

Generally speaking, the solution you suggest is something like:

run "describe detail table_name" on one of the clusters from PROD_CLUSTERS group, for all tables, generating a list of table_name & table_s3_location
Copy/move this list to the ADHOC_CLUSTER, and iterate over the list to create the view for each table

Actually, as a workaround, I already have a similar solution: a job in ****bleep** iterates over it to create the views, in ADHOC_CLUSTER.

Both jobs are scheduled over the day, so as new tables are created in PROD - new views are generated in ADHOC.

This works fine but is somewhat cumbersome (as it always requires 2 jobs for each prod & adhoc pair, some syncing, etc.).

My intention was to simplify this architecture.

The thing is, in step 1 above, the "describe" statement has to run one of the PROD_CLUSTERS. That's what forces me to use 2 jobs: one listing the tables' definitions (running in prod) when reading that list and generating the views (in ad-hoc).

The "describe detail table_name" cannot run in ADHOC_CLUSTER - this cluster uses a different HMS, where there is no table_name.

To make it clear, there are 2 different Hive Metastores:

PROD_CLUSTERS - prod_HMS
ADHOC_CLUSTER - adhoc_HMS

Hence my attempt to JDBC to prod_HMS, directly to the MySQL instance, from a notebook running in ADHOC_CLUSTER, and run a SELECT that will "imitate" the functionality of "describe detail table_name".