<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How can we connect to 2 different hive spark.hadoop.hive.metastore.uris in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-can-we-connect-to-2-different-hive-spark-hadoop-hive/m-p/71419#M34313</link>
    <description>&lt;P&gt;We need to read a table from &lt;STRONG&gt;2 different&lt;/STRONG&gt; &lt;STRONG&gt;spark.hadoop.hive.metastore.uris&lt;/STRONG&gt;&amp;nbsp;and do some validations.&lt;/P&gt;&lt;P&gt;We are not able to connect to both &lt;STRONG&gt;spark.hadoop.hive.metastore.uris&lt;/STRONG&gt; at the &lt;STRONG&gt;same time&lt;/STRONG&gt; using &lt;STRONG&gt;sparkSession&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;I will be using &lt;STRONG&gt;Spark version: 3.1.1&lt;/STRONG&gt; and the language is Scala.&lt;/P&gt;&lt;P&gt;Please comment if any suggestions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 03 Jun 2024 08:07:01 GMT</pubDate>
    <dc:creator>maskepravin02</dc:creator>
    <dc:date>2024-06-03T08:07:01Z</dc:date>
    <item>
      <title>How can we connect to 2 different hive spark.hadoop.hive.metastore.uris</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-we-connect-to-2-different-hive-spark-hadoop-hive/m-p/71419#M34313</link>
      <description>&lt;P&gt;We need to read a table from &lt;STRONG&gt;2 different&lt;/STRONG&gt; &lt;STRONG&gt;spark.hadoop.hive.metastore.uris&lt;/STRONG&gt;&amp;nbsp;and do some validations.&lt;/P&gt;&lt;P&gt;We are not able to connect to both &lt;STRONG&gt;spark.hadoop.hive.metastore.uris&lt;/STRONG&gt; at the &lt;STRONG&gt;same time&lt;/STRONG&gt; using &lt;STRONG&gt;sparkSession&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;I will be using &lt;STRONG&gt;Spark version: 3.1.1&lt;/STRONG&gt; and the language is Scala.&lt;/P&gt;&lt;P&gt;Please comment if any suggestions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Jun 2024 08:07:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-we-connect-to-2-different-hive-spark-hadoop-hive/m-p/71419#M34313</guid>
      <dc:creator>maskepravin02</dc:creator>
      <dc:date>2024-06-03T08:07:01Z</dc:date>
    </item>
    <item>
      <title>Re: How can we connect to 2 different hive spark.hadoop.hive.metastore.uris</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-we-connect-to-2-different-hive-spark-hadoop-hive/m-p/72712#M34590</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;We have used&amp;nbsp;&lt;SPAN&gt;&lt;STRONG&gt;spark.hadoop.hive.metastore.uris.&lt;BR /&gt;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Created 2 spark session in same application, with different hive metastore uris 1st is for AWS with all AWS properties and 2nd is for GCP with all GCP connection properties.&lt;BR /&gt;&lt;BR /&gt;Where we have 1st spark session and 2nd spark session also pointing to 1st only if at all we created 2nd spark session.&lt;BR /&gt;&lt;BR /&gt;It seems internally we will only create only 1 spark context per applications, let me know if you have any sample code or any other documentation regarding the same.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks in advance !&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jun 2024 04:58:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-we-connect-to-2-different-hive-spark-hadoop-hive/m-p/72712#M34590</guid>
      <dc:creator>maskepravin02</dc:creator>
      <dc:date>2024-06-12T04:58:03Z</dc:date>
    </item>
    <item>
      <title>Re: How can we connect to 2 different hive spark.hadoop.hive.metastore.uris</title>
      <link>https://community.databricks.com/t5/data-engineering/how-can-we-connect-to-2-different-hive-spark-hadoop-hive/m-p/72733#M34591</link>
      <description>&lt;P&gt;Hi there&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106111"&gt;@maskepravin02&lt;/a&gt;,&lt;BR /&gt;We have once implemented this approach of two reading two different hive metasores, but it was not on AWS and GCP, maybe the docs can help.&lt;BR /&gt;&lt;BR /&gt;Though it is not recommended&amp;nbsp;&lt;/P&gt;&lt;P&gt;The best approach is to create separate spark applications to connect each metastore, maybe orchestrate and write them and then join them.&lt;/P&gt;&lt;P&gt;- One other method can be dynamic switching but it is quite error-prone, I don't know whether it will support for AWS and GCP or not :&lt;BR /&gt;Here are the docs :&lt;BR /&gt;1. &lt;A href="https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html" target="_blank"&gt;https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2.&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties" target="_blank"&gt;https://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3.&amp;nbsp;&lt;A href="https://stackoverflow.com/questions/32714396/querying-on-multiple-hive-stores-using-apache-spark" target="_blank"&gt;https://stackoverflow.com/questions/32714396/querying-on-multiple-hive-stores-using-apache-spark&lt;/A&gt;&lt;/P&gt;&lt;P&gt;4. Some code I extracted from GPT and Gemini:&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .appName("Dynamic Hive Metastore")
  .enableHiveSupport()
  .getOrCreate()

def switchMetastore(spark: SparkSession, metastoreUri: String): Unit = {
  // Set the Hive metastore URI dynamically
  spark.conf.set("spark.hadoop.hive.metastore.uris", metastoreUri)
  // Refresh the catalog to ensure it uses the new metastore
  spark.catalog.refreshTable("your_table")
}

// Example usage
switchMetastore(spark, "thrift://aws-metastore-uri:9083")
val awsDf = spark.sql("SELECT * FROM your_table")
awsDf.show()

switchMetastore(spark, "thrift://gcp-metastore-uri:9083")
val gcpDf = spark.sql("SELECT * FROM your_table")
gcpDf.show()

spark.stop()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;Hope this helps you move forward.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jun 2024 06:24:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-can-we-connect-to-2-different-hive-spark-hadoop-hive/m-p/72733#M34591</guid>
      <dc:creator>ashraf1395</dc:creator>
      <dc:date>2024-06-12T06:24:58Z</dc:date>
    </item>
  </channel>
</rss>

