cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is the promise of a data lake simply about data science, data analytics and data quality or can it also be an integral part of core transaction processing also?

MarcJustice
New Contributor

Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direction if that's appropriate.

In order to achieve the benefits of data science, data analytics and facilitate data quality, my company has made the decision to invest in a building data lake.   Almost immediately our application solution engineers observed that could/should be able to get access to multi-domain and/or mastered single domain data through data APIs built on top of the lake, rather than relying on multiple application APIs or consuming API's built on unmastered/uncertified data in source systems. Assuming that one of the primary goals of the data lake is improving data quality, how can you introduce data quality rules at scale without creating a version control problem in your API catalog that your application owners ultimately can't keep up with and really just becomes tech debt? The promise of the lake can't simply be about science and analytics, can it?

3 REPLIES 3

Aashita
Contributor III
Contributor III

@Marc Barnett​ , Databricks’ Lakehouse architecture is the ideal data architecture for data-driven organizations. It combines the best qualities of data warehouses and data lakes to provide a single solution for all major data workloads and supports use cases from streaming analytics to BI, data science, and AI. To understand what a Lakehouse architecture is start with this https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html?itm_data=lakehouse-link-lakehou...

  1. Data quality can be taken care by Databricks’ Delta Live tables. Here is a comparison with traditional ETL and what Databricks offers https://databricks.com/blog/2021/09/08/5-steps-to-implementing-intelligent-data-pipelines-with-delta...
  2. For Version control: https://docs.databricks.com/notebooks/github-version-control.html

Let me know if you would like to know about something specific in detail. We are here to help!

Kaniz
Community Manager
Community Manager

Hi @Marc Barnett​ , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke​ 's response help you to find the solution? Please let us know.

Hi @Marc Barnett​,

Just a friendly follow-up. Did any of the responses help you to resolve your question?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.