cancel
Showing results for 
Search instead for 
Did you mean: 
Data Governance
cancel
Showing results for 
Search instead for 
Did you mean: 

Exploiting Hive Metastore of Databricks for lineage

NOOR_BASHASHAIK
Contributor

Hi all

Wanted to check if anyone has made an attempt to exploit the Hive Metastore of Databricks for lineage?

For example, I loaded metadata of 2 databricks databases using the Collibra Marketplace provided Databricks driver. Here is the scenario -

Database 1 > Table_A

Database 2 > View_A based on Table_A

As the table & view relations are implicit, I expected the driver to show lineage/links between these 2 objects across databases within Collibra but it did not.

So, I plan to fetch the relationships information from Hive Metastore and feed into Collibra.

Couple of questions -

  1. Where can I see the data model of Hive Metastore? Is there any documentation link from Databricks side so i can quickly understand the schemata of the metastore.
  2. Is it advisable to query the metastore tables or are there any side-effects?
  3. How easy will it be to fetch the relationships between tables & views? Is there an out-of-the-box query?
1 ACCEPTED SOLUTION

Accepted Solutions

Hi @NOOR BASHA SHAIK​ , This article describes how to set up Databricks clusters to connect to existing external Apache Hive metastores.

It provides information about metastore deployment modes, recommended network setup, and cluster configuration requirements, followed by instructions for configuring clusters to connect to an external metastore.

The following table summarizes which Hive metastore versions are supported in each version of Databricks Runtime.

Databricks on Google Cloud supports Databricks Runtime 7.3 and above.

imageImportant

  • SQL Server does not work as the underlying metastore database for Hive 2.0 and above.
  • If you use Azure Database for MySQL as an external metastore, you must change the value of the lower_case_table_names property from 1 (the default) to 2 in the server-side database configuration. For details, see Identifier Case Sensitivity.

View solution in original post

6 REPLIES 6

Kaniz
Community Manager
Community Manager

Hi @NOOR BASHA SHAIK​ ,

Azure Purview now supports Hive Metastore Database as a source. The Hive Metastore source supports Full scan to extract metadata from a Hive Metastore database and fetches Lineage between data assets. The supported platforms are Apache Hadoop, Cloudera, Hortonworks, and Databricks.

For details, please read our documentation.

Kaniz
Community Manager
Community Manager

Hi @NOOR BASHA SHAIK​ , Just a friendly follow-up. Do you still need help, or did my response help you find the solution? Please let us know.

Anonymous
Not applicable

Hi @NOOR BASHASHAIK (Customer)​ , Just a friendly follow-up. Do you still need help, or did Kaniz's response help you find the solution?

NOOR_BASHASHAIK
Contributor

@Chetan Kardekar​ @Kaniz Fatma​ yes, I still need a standard way (through SQL) to access Hive Metastore.

Hi @NOOR BASHA SHAIK​ , This article describes how to set up Databricks clusters to connect to existing external Apache Hive metastores.

It provides information about metastore deployment modes, recommended network setup, and cluster configuration requirements, followed by instructions for configuring clusters to connect to an external metastore.

The following table summarizes which Hive metastore versions are supported in each version of Databricks Runtime.

Databricks on Google Cloud supports Databricks Runtime 7.3 and above.

imageImportant

  • SQL Server does not work as the underlying metastore database for Hive 2.0 and above.
  • If you use Azure Database for MySQL as an external metastore, you must change the value of the lower_case_table_names property from 1 (the default) to 2 in the server-side database configuration. For details, see Identifier Case Sensitivity.

Hi @NOOR BASHA SHAIK​,

Just a friendly follow-up. Are you still looking for help or did Kaniz's reply helped you to resolved your question?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.