cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

Data model tool to connect to Databricks or Data lake?

Nachappa
New Contributor III

Hi Everyone,

From data modeling documentation (Dimensional/ ER Diagram), is there any tool available which can connect to databricks/ data lake and read the table structure directly and also updates the structure of table whenever there is a addition or deletions of columns in a table. 

And in a process, it should not remove the relationship made between tables whenever there is an update to a columns and/ or tables (addition/ deletion). And version control on same will be helpful using GIT etc.

Reason being I understand the PK and FK details are not maintained in datalake/ databricks tables entities. Request to please propose if any modeling tools are present for this use case.

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Darshan M S​,

Thank you for your question.

This article - Working with Entity-Relationship (ER) Diagrams on Databricks, helps connect one of these tools to Databricks with the focus on generating an Entity-Relationship (ER) Diagram.

Please let us know if you have any further queries.

Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

View solution in original post

8 REPLIES 8

Kaniz
Community Manager
Community Manager

Hi @Darshan M S​,

Thank you for your question.

This article - Working with Entity-Relationship (ER) Diagrams on Databricks, helps connect one of these tools to Databricks with the focus on generating an Entity-Relationship (ER) Diagram.

Please let us know if you have any further queries.

Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Nachappa
New Contributor III

Hi @Kaniz Fatma​ ,

Thank you for this information and it's helpful as I referred provided wiki pages links and I will start the POC and will provide a feedback by this week or early next week. Please suggest if the tool is also supports GIT/ alternate repositories maintenance.

Kaniz
Community Manager
Community Manager

Hi @Darshan M S​, Thank you for sharing the update. We shall be looking forward to your feedback.

Check this article for GIT repos maintenance.

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Hi @Darshan M S​ once you setup the JDBC connection, you should be able to use the sql tool to connect with Databricks. Then you can use the repos feature where you will be integrating GIT.

Kaniz
Community Manager
Community Manager

Hi @Darshan M S​  , I was checking back to see if @Prabakar Ammeappin​ 's suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to others.

Nachappa
New Contributor III

Hi @Kaniz Fatma​ , @Prabakar Ammeappin​ :

Thanks for the reply and information. Yes, I am able to connect via DBeaver to Databricks using the JDBC and supported provided link (Sorry for delay in update as I had to try on Trial version of Enterprise DBeaver) and additional links which I had followed are https://docs.databricks.com/dev-tools/dbeaver.html and https://databricks.com/spark/jdbc-drivers-download (As download in primary link is pointing to odbc drivers) and URL may need some changes based on Databricks which is available in second link I have shared.

Observations:

  • Very neatly able to connect to databricks schema and tables and ER diagram can be built based on selective schema/ tables as well.

  • If PK/ FK or constraints are not defined in Databricks delta table then we cannot add it in table while doing ER diagram in properties as that will try to persist and it will fail as ALTER is not supported and only CHECK is supported as of now.

  • Virtual relationships can be performed and it looks good. While exporting as pdf via print option, then we cannot get the details of virtual relationships (we can see which table is connected to other, column analysis need to be made visually or depend on tool).

  • Q1: Any way to extract the relationship made between tables as well like ER export, and am I doing right way of export?

  • If multiple focal points need to work on same file, changing the properties is not possible under Projects tab i.e., C:\Users\UserProfile\AppData\Roaming\DBeaverData\workspace6 and we cannot change the path.

  • Q2: I did not get how to enable in tool of 's feedback "Then you can use the repos feature where you will be integrating GIT." in tool. Forward question on the same subject i.e., if multiple folks need to work on tool, then do we require multiple licenses (Is license is transferrable based on request, maybe specific to DBeaver team)

Nachappa
New Contributor III

Hi @Kaniz Fatma​ , @Prabakar Ammeappin​ , Good day.

I found that by creating a new project, we can change the saving location.

Only Q I have is "Any way to extract the virtual key relationship made between tables as well like ER export, and am I doing right way of export?"

Kaniz
Community Manager
Community Manager

@Darshan M S​ , Excellent, Thanks for the update!

Here is a guide to the relationship between tables in a data model.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.