cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is the promise of a data lake simply about data science, data analytics and data quality or can it also be an integral part of core transaction processing also?

MarcJustice
New Contributor

Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direction if that's appropriate.

In order to achieve the benefits of data science, data analytics and facilitate data quality, my company has made the decision to invest in a building data lake.   Almost immediately our application solution engineers observed that could/should be able to get access to multi-domain and/or mastered single domain data through data APIs built on top of the lake, rather than relying on multiple application APIs or consuming API's built on unmastered/uncertified data in source systems. Assuming that one of the primary goals of the data lake is improving data quality, how can you introduce data quality rules at scale without creating a version control problem in your API catalog that your application owners ultimately can't keep up with and really just becomes tech debt? The promise of the lake can't simply be about science and analytics, can it?

3 REPLIES 3

Aashita
Contributor III

@Marc Barnett​ , Databricks’ Lakehouse architecture is the ideal data architecture for data-driven organizations. It combines the best qualities of data warehouses and data lakes to provide a single solution for all major data workloads and supports use cases from streaming analytics to BI, data science, and AI. To understand what a Lakehouse architecture is start with this https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html?itm_data=lakehouse-link-lakehou...

  1. Data quality can be taken care by Databricks’ Delta Live tables. Here is a comparison with traditional ETL and what Databricks offers https://databricks.com/blog/2021/09/08/5-steps-to-implementing-intelligent-data-pipelines-with-delta...
  2. For Version control: https://docs.databricks.com/notebooks/github-version-control.html

Let me know if you would like to know about something specific in detail. We are here to help!

Kaniz_Fatma
Community Manager
Community Manager

Hi @Marc Barnett​ , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke​ 's response help you to find the solution? Please let us know.

Hi @Marc Barnett​,

Just a friendly follow-up. Did any of the responses help you to resolve your question?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group