Hi there @maskepravin02,
We have once implemented this approach of two reading two different hive metasores, but it was not on AWS and GCP, maybe the docs can help.
Though it is not recommended
The best approach is to create separate spark applications to connect each metastore, maybe orchestrate and write them and then join them.
- One other method can be dynamic switching but it is quite error-prone, I don't know whether it will support for AWS and GCP or not :
Here are the docs :
1. https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
2. https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties
3. https://stackoverflow.com/questions/32714396/querying-on-multiple-hive-stores-using-apache-spark
4. Some code I extracted from GPT and Gemini:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.appName("Dynamic Hive Metastore")
.enableHiveSupport()
.getOrCreate()
def switchMetastore(spark: SparkSession, metastoreUri: String): Unit = {
// Set the Hive metastore URI dynamically
spark.conf.set("spark.hadoop.hive.metastore.uris", metastoreUri)
// Refresh the catalog to ensure it uses the new metastore
spark.catalog.refreshTable("your_table")
}
// Example usage
switchMetastore(spark, "thrift://aws-metastore-uri:9083")
val awsDf = spark.sql("SELECT * FROM your_table")
awsDf.show()
switchMetastore(spark, "thrift://gcp-metastore-uri:9083")
val gcpDf = spark.sql("SELECT * FROM your_table")
gcpDf.show()
spark.stop()
Hope this helps you move forward.