Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-11-2023 01:10 AM
Hey @Mo, thanks for the input!
Yes, points 1 & 2 are correct 🙂
Generally speaking, the solution you suggest is something like:
- run "describe detail table_name" on one of the clusters from PROD_CLUSTERS group, for all tables, generating a list of table_name & table_s3_location
- Copy/move this list to the ADHOC_CLUSTER, and iterate over the list to create the view for each table
Actually, as a workaround, I already have a similar solution: a job in ****bleep** iterates over it to create the views, in ADHOC_CLUSTER.
Both jobs are scheduled over the day, so as new tables are created in PROD - new views are generated in ADHOC.
This works fine but is somewhat cumbersome (as it always requires 2 jobs for each prod & adhoc pair, some syncing, etc.).
My intention was to simplify this architecture.
The thing is, in step 1 above, the "describe" statement has to run one of the PROD_CLUSTERS. That's what forces me to use 2 jobs: one listing the tables' definitions (running in prod) when reading that list and generating the views (in ad-hoc).
The "describe detail table_name" cannot run in ADHOC_CLUSTER - this cluster uses a different HMS, where there is no table_name.
To make it clear, there are 2 different Hive Metastores:
- PROD_CLUSTERS - prod_HMS
- ADHOC_CLUSTER - adhoc_HMS
Hence my attempt to JDBC to prod_HMS, directly to the MySQL instance, from a notebook running in ADHOC_CLUSTER, and run a SELECT that will "imitate" the functionality of "describe detail table_name".