03-17-2022 08:00 AM
Hi all
Wanted to check if anyone has made an attempt to exploit the Hive Metastore of Databricks for lineage?
For example, I loaded metadata of 2 databricks databases using the Collibra Marketplace provided Databricks driver. Here is the scenario -
Database 1 > Table_A
Database 2 > View_A based on Table_A
As the table & view relations are implicit, I expected the driver to show lineage/links between these 2 objects across databases within Collibra but it did not.
So, I plan to fetch the relationships information from Hive Metastore and feed into Collibra.
Couple of questions -
05-17-2022 12:28 AM
Hi @NOOR BASHA SHAIK , This article describes how to set up Databricks clusters to connect to existing external Apache Hive metastores.
It provides information about metastore deployment modes, recommended network setup, and cluster configuration requirements, followed by instructions for configuring clusters to connect to an external metastore.
The following table summarizes which Hive metastore versions are supported in each version of Databricks Runtime.
Databricks on Google Cloud supports Databricks Runtime 7.3 and above.
Important
03-22-2022 10:01 AM
Hi @NOOR BASHA SHAIK ,
Azure Purview now supports Hive Metastore Database as a source. The Hive Metastore source supports Full scan to extract metadata from a Hive Metastore database and fetches Lineage between data assets. The supported platforms are Apache Hadoop, Cloudera, Hortonworks, and Databricks.
For details, please read our documentation.
04-26-2022 03:52 PM
Hi @NOOR BASHA SHAIK , Just a friendly follow-up. Do you still need help, or did my response help you find the solution? Please let us know.
05-16-2022 10:43 PM
Hi @NOOR BASHASHAIK (Customer) , Just a friendly follow-up. Do you still need help, or did Kaniz's response help you find the solution?
05-16-2022 11:52 PM
@Chetan Kardekar @Kaniz Fatma yes, I still need a standard way (through SQL) to access Hive Metastore.
05-17-2022 12:28 AM
Hi @NOOR BASHA SHAIK , This article describes how to set up Databricks clusters to connect to existing external Apache Hive metastores.
It provides information about metastore deployment modes, recommended network setup, and cluster configuration requirements, followed by instructions for configuring clusters to connect to an external metastore.
The following table summarizes which Hive metastore versions are supported in each version of Databricks Runtime.
Databricks on Google Cloud supports Databricks Runtime 7.3 and above.
Important
06-07-2022 09:48 AM
Hi @NOOR BASHA SHAIK,
Just a friendly follow-up. Are you still looking for help or did Kaniz's reply helped you to resolved your question?
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.