Lakehouse Concept
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 11:11 PM
I want to understand lake house concept in very brief If I have to pitch for a customer in 1 minute
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2021 11:11 PM
Lakehouse is a concept defined with the following Parameter-
- Data is stored in an open standard format.
- Data is stored in a way which support Data Science,ML and BI loads.
- Delta is just a way or engine on cloud storage that provides control on data and prevent it from becoming data swamp and also add performance and provide sql like query support
- for lake house it is always recommended to have 3 layers,
- Bronze - Raw data as it is from OTP
- Silver -data in a curated format and with a filter that does not allow any junk data to silver, this layer is best suited for Data science and ML
- gold layer-Purely aggregated data that helps in BI and can be used in Machine learning too.
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2022 02:28 PM
Using the metastore for data lakehouse I have several follow up questions to your answer:
- Will companies generally have only one database that represents the data lakehouse?
- Will bronze tables be kept in the above database or will it have a separate database just for source data, or no database assignment at all?
- Files brought into as bronze tables will converted to Delta/Parqette?
- Will tables created in Silver tier be named with silver as a prefix or suffix, and if not how can we differentiate the silver tables from the gold?
I have not seen the best practice naming conventions anywhere.