โ06-07-2021 09:57 AM
What's the best way to organize our data lake and delta setup? Weโre trying to use the bronze, silver and gold classification strategy. The main question is how do we know what classification the data is inside Databricks if thereโs no actual physical place called bronze, silver and gold? What are the naming conventions/strategies recommended by Databricks?
โ04-11-2023 12:47 AM
Hi @Josephine Hoโ , Database objects naming conventions and coding standards are crucial to maintaining consistency, readability, and manageability in a data engineering project.
In Databricks, you can use the naming conventions and coding norms for the Bronze, Silver, and Gold layers.
Following these naming conventions and coding standards allows you to maintain a well-structured, easily understandable, and maintainable data engineering project in Databricks.
โ08-13-2023 01:10 AM
@Kaniz_Fatma , Thank you for the detailed guidelines on naming conventions for the Bronze, Silver, and Gold layers in Databricks. These conventions are certainly valuable for maintaining consistency and manageability.
I'd like to inquire about the best practices for structuring the Database and Schema names, especially in the context of managed tables within the Medallion Architecture in Delta Lake.
With unmanaged tables, the folder structure allows us to segregate the Gold, Silver, and Bronze layers effectively. However, with managed tables, we don't have control over the folder structure.
Is there a difference in maintaining the naming convention between Managed or Unmanaged tables, particularly in implementing the Medallion Architecture? Could you please provide insights or recommendations on how to approach this to ensure a well-structured and maintainable data engineering environment?
Your guidance on this matter would be greatly appreciated.
Thank you!
Ram
โ09-18-2023 06:26 PM
Hi @Kaniz_Fatma,
I have a doubt. The bronze layer always causes confusion for me. You mentioned, "File Format: Store data in Delta Lake format to leverage its performance, ACID transactions, and schema evolution capabilities" for silver layers.
Then, does this mean that is not needed to preserve the data in its original format? for instance, if this comes in JSON format from the source system or if we are exporting this data from the source database in CSV format compressed in zip files?
This part confused me, should we not store the data in its original format as per the medallion architecture? and should we only rely on the bronze layer for data history, lineage, audit, and reprocessing?
Thank you very much in advance for clarify this for me.
Best Regards
โ08-13-2023 12:54 PM - edited โ08-13-2023 12:55 PM
Hi @ramdhilip ,
โ09-19-2023 03:24 AM
with Unity taking into account, it is certainly a good idea to think about your physical data storage.
As you cannot have overlap between volumes and tables this can become cumbersome.
F.e. we used to store delta tables of a data object in the same directory as your ingested files.
With unity, this structure is now impossible.
So I'd create a separate container for tables and one for volumes, to avoid this overlap.
This is of course easier said than done on an existing environment.
As much as I like Unity, it does give me a lot of headaches because we have to do serious refactoring to embrace Unity.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group