Best practices around bronze/silver/gold (medallion model) data lake classification?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ06-07-2021 09:57 AM
What's the best way to organize our data lake and delta setup? Weโre trying to use the bronze, silver and gold classification strategy. The main question is how do we know what classification the data is inside Databricks if thereโs no actual physical place called bronze, silver and gold? What are the naming conventions/strategies recommended by Databricks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ08-13-2023 01:10 AM
@Retired_mod , Thank you for the detailed guidelines on naming conventions for the Bronze, Silver, and Gold layers in Databricks. These conventions are certainly valuable for maintaining consistency and manageability.
I'd like to inquire about the best practices for structuring the Database and Schema names, especially in the context of managed tables within the Medallion Architecture in Delta Lake.
With unmanaged tables, the folder structure allows us to segregate the Gold, Silver, and Bronze layers effectively. However, with managed tables, we don't have control over the folder structure.
Is there a difference in maintaining the naming convention between Managed or Unmanaged tables, particularly in implementing the Medallion Architecture? Could you please provide insights or recommendations on how to approach this to ensure a well-structured and maintainable data engineering environment?
Your guidance on this matter would be greatly appreciated.
Thank you!
Ram
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ09-18-2023 06:26 PM
Hi @Retired_mod,
I have a doubt. The bronze layer always causes confusion for me. You mentioned, "File Format: Store data in Delta Lake format to leverage its performance, ACID transactions, and schema evolution capabilities" for silver layers.
Then, does this mean that is not needed to preserve the data in its original format? for instance, if this comes in JSON format from the source system or if we are exporting this data from the source database in CSV format compressed in zip files?
This part confused me, should we not store the data in its original format as per the medallion architecture? and should we only rely on the bronze layer for data history, lineage, audit, and reprocessing?
Thank you very much in advance for clarify this for me.
Best Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ09-19-2023 03:24 AM
with Unity taking into account, it is certainly a good idea to think about your physical data storage.
As you cannot have overlap between volumes and tables this can become cumbersome.
F.e. we used to store delta tables of a data object in the same directory as your ingested files.
With unity, this structure is now impossible.
So I'd create a separate container for tables and one for volumes, to avoid this overlap.
This is of course easier said than done on an existing environment.
As much as I like Unity, it does give me a lot of headaches because we have to do serious refactoring to embrace Unity.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
โ01-15-2025 02:01 AM
Has the reply from @Retired_mod been removed?

