07-04-2022 04:41 AM
Hi Everyone,
From data modeling documentation (Dimensional/ ER Diagram), is there any tool available which can connect to databricks/ data lake and read the table structure directly and also updates the structure of table whenever there is a addition or deletions of columns in a table.
And in a process, it should not remove the relationship made between tables whenever there is an update to a columns and/ or tables (addition/ deletion). And version control on same will be helpful using GIT etc.
Reason being I understand the PK and FK details are not maintained in datalake/ databricks tables entities. Request to please propose if any modeling tools are present for this use case.
Thanks,
07-05-2022 04:06 AM
Hi @Darshan M S,
Thank you for your question.
This article - Working with Entity-Relationship (ER) Diagrams on Databricks, helps connect one of these tools to Databricks with the focus on generating an Entity-Relationship (ER) Diagram.
Please let us know if you have any further queries.
Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
07-05-2022 04:06 AM
Hi @Darshan M S,
Thank you for your question.
This article - Working with Entity-Relationship (ER) Diagrams on Databricks, helps connect one of these tools to Databricks with the focus on generating an Entity-Relationship (ER) Diagram.
Please let us know if you have any further queries.
Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.
07-05-2022 05:10 AM
Hi @Kaniz Fatma ,
Thank you for this information and it's helpful as I referred provided wiki pages links and I will start the POC and will provide a feedback by this week or early next week. Please suggest if the tool is also supports GIT/ alternate repositories maintenance.
07-05-2022 05:35 AM
Hi @Darshan M S, Thank you for sharing the update. We shall be looking forward to your feedback.
Check this article for GIT repos maintenance.
07-07-2022 03:12 AM
Hi @Darshan M S once you setup the JDBC connection, you should be able to use the sql tool to connect with Databricks. Then you can use the repos feature where you will be integrating GIT.
07-07-2022 03:35 AM
Hi @Darshan M S , I was checking back to see if @Prabakar Ammeappin 's suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to others.
07-08-2022 07:16 AM
Hi @Kaniz Fatma , @Prabakar Ammeappin :
Thanks for the reply and information. Yes, I am able to connect via DBeaver to Databricks using the JDBC and supported provided link (Sorry for delay in update as I had to try on Trial version of Enterprise DBeaver) and additional links which I had followed are https://docs.databricks.com/dev-tools/dbeaver.html and https://databricks.com/spark/jdbc-drivers-download (As download in primary link is pointing to odbc drivers) and URL may need some changes based on Databricks which is available in second link I have shared.
Observations:
07-12-2022 10:04 PM
Hi @Kaniz Fatma , @Prabakar Ammeappin , Good day.
I found that by creating a new project, we can change the saving location.
Only Q I have is "Any way to extract the virtual key relationship made between tables as well like ER export, and am I doing right way of export?"
07-12-2022 10:22 PM
@Darshan M S , Excellent, Thanks for the update!
Here is a guide to the relationship between tables in a data model.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.