03-17-2022 08:00 AM
Hi all
Wanted to check if anyone has made an attempt to exploit the Hive Metastore of Databricks for lineage?
For example, I loaded metadata of 2 databricks databases using the Collibra Marketplace provided Databricks driver. Here is the scenario -
Database 1 > Table_A
Database 2 > View_A based on Table_A
As the table & view relations are implicit, I expected the driver to show lineage/links between these 2 objects across databases within Collibra but it did not.
So, I plan to fetch the relationships information from Hive Metastore and feed into Collibra.
Couple of questions -
05-17-2022 12:28 AM
Hi @NOOR BASHA SHAIK , This article describes how to set up Databricks clusters to connect to existing external Apache Hive metastores.
It provides information about metastore deployment modes, recommended network setup, and cluster configuration requirements, followed by instructions for configuring clusters to connect to an external metastore.
The following table summarizes which Hive metastore versions are supported in each version of Databricks Runtime.
Databricks on Google Cloud supports Databricks Runtime 7.3 and above.
Important
03-22-2022 10:01 AM
Hi @NOOR BASHA SHAIK ,
Azure Purview now supports Hive Metastore Database as a source. The Hive Metastore source supports Full scan to extract metadata from a Hive Metastore database and fetches Lineage between data assets. The supported platforms are Apache Hadoop, Cloudera, Hortonworks, and Databricks.
For details, please read our documentation.
04-26-2022 03:52 PM
Hi @NOOR BASHA SHAIK , Just a friendly follow-up. Do you still need help, or did my response help you find the solution? Please let us know.
05-16-2022 10:43 PM
Hi @NOOR BASHASHAIK (Customer) , Just a friendly follow-up. Do you still need help, or did Kaniz's response help you find the solution?
05-16-2022 11:52 PM
@Chetan Kardekar @Kaniz Fatma yes, I still need a standard way (through SQL) to access Hive Metastore.
05-17-2022 12:28 AM
Hi @NOOR BASHA SHAIK , This article describes how to set up Databricks clusters to connect to existing external Apache Hive metastores.
It provides information about metastore deployment modes, recommended network setup, and cluster configuration requirements, followed by instructions for configuring clusters to connect to an external metastore.
The following table summarizes which Hive metastore versions are supported in each version of Databricks Runtime.
Databricks on Google Cloud supports Databricks Runtime 7.3 and above.
Important
06-07-2022 09:48 AM
Hi @NOOR BASHA SHAIK,
Just a friendly follow-up. Are you still looking for help or did Kaniz's reply helped you to resolved your question?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group