โ06-28-2023 07:23 AM
Why do we need tiers of data? Why can't we just have all the data go to one tier and just work off of that?
โ06-28-2023 05:57 PM
Here are a few reasons data tiers are needed.
1. Performance Optimization: Different tiers of data allow for optimized performance based on the specific needs of each tier. For example, high-priority or frequently accessed data can be stored in a high-performance tier with faster access times and processing capabilities. This ensures that critical data is readily available and can be processed quickly, resulting in improved operational efficiency.
2. Resource Allocation: Data tiers enable organizations to allocate resources such as storage, computing power, and bandwidth more efficiently. Not all data requires the same level of resources. By segregating data into different tiers, organizations can match resource allocation to the specific needs of each tier.
3. Data Retention Policies: Different types of data may have varying retention requirements based on legal, compliance, or business needs. Tiers of data facilitate the implementation of data retention policies.
โ06-28-2023 08:40 AM
As per databricks documentation, goal of incrementally and progressively improving the structure and quality of data as it flows through each layer of the architecture.
Most of the times, raw data is not useful and need to be cleaned or supplemented with other data set.
we can store it in one layer, but itโs easier to understand and manage if those are kept Separate. This can be done logical or physical. Itโs really your choice. You are going to find use cases where someone might need to access bronze data for their gold use cases. There can be some data quality issue with it.
โ06-28-2023 05:57 PM
Here are a few reasons data tiers are needed.
1. Performance Optimization: Different tiers of data allow for optimized performance based on the specific needs of each tier. For example, high-priority or frequently accessed data can be stored in a high-performance tier with faster access times and processing capabilities. This ensures that critical data is readily available and can be processed quickly, resulting in improved operational efficiency.
2. Resource Allocation: Data tiers enable organizations to allocate resources such as storage, computing power, and bandwidth more efficiently. Not all data requires the same level of resources. By segregating data into different tiers, organizations can match resource allocation to the specific needs of each tier.
3. Data Retention Policies: Different types of data may have varying retention requirements based on legal, compliance, or business needs. Tiers of data facilitate the implementation of data retention policies.
โ06-28-2023 10:18 PM
In addition to the reasons mentioned such as resource allocation, performance optimization and retention, there are also aspects of data curation that are to be considered here.
The bronze layer is often very close to the source that enables replay-ability as well as a point for debugging when upstream systems aren't accesible. The silver layer enables deduplication and curation per enterprise needs, the base copy is still available in bronze for access as required.
The gold layer enables data blendin, look-up and enrichment of datasets for various use cases
โ06-29-2023 01:17 PM
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group