Nino
Contributor
Hey @Mo, thanks for the input!

Yes, points 1 & 2 are correct 🙂

Generally speaking, the solution you suggest is something like:

  1. run "describe detail table_name" on one of the clusters from PROD_CLUSTERS group, for all tables, generating a list of table_name & table_s3_location
  2.  Copy/move this list to the ADHOC_CLUSTER, and iterate over the list to create the view for each table 
Actually, as a workaround, I already have a similar solution: a job in ****bleep** iterates over it to create the views, in ADHOC_CLUSTER. 

Both jobs are scheduled over the day, so as new tables are created in PROD - new views are generated in ADHOC. 

This works fine but is somewhat cumbersome (as it always requires 2 jobs for each prod & adhoc pair, some syncing, etc.).

My intention was to simplify this architecture. 

The thing is, in step 1 above, the "describe" statement has to run one of the PROD_CLUSTERS. That's what forces me to use 2 jobs: one listing the tables' definitions (running in prod) when reading that list and generating the views (in ad-hoc). 

The  "describe detail table_name" cannot run in ADHOC_CLUSTER - this cluster uses a different HMS, where there is no table_name. 

To make it clear, there are 2 different Hive Metastores:
  1. PROD_CLUSTERS - prod_HMS
  2. ADHOC_CLUSTER - adhoc_HMS
Hence my attempt to JDBC to prod_HMS, directly to the MySQL instance, from a notebook running in ADHOC_CLUSTER, and run a SELECT that will "imitate" the functionality of "describe detail table_name".