What's the best way to organize our data lake and delta setup? We’re trying to use the bronze, silver and gold classification strategy. The main question is how do we know what classification the data is inside Databricks if there’s no actual physical place called bronze, silver and gold? What are the naming conventions/strategies recommended by Databricks?
Hi @Josephine Ho , Database objects naming conventions and coding standards are crucial to maintaining consistency, readability, and manageability in a data engineering project.
In Databricks, you can use the naming conventions and coding norms for the Bronze, Silver, and Gold layers.
Following these naming conventions and coding standards allows you to maintain a well-structured, easily understandable, and maintainable data engineering project in Databricks.
@Kaniz , Thank you for the detailed guidelines on naming conventions for the Bronze, Silver, and Gold layers in Databricks. These conventions are certainly valuable for maintaining consistency and manageability.
I'd like to inquire about the best practices for structuring the Database and Schema names, especially in the context of managed tables within the Medallion Architecture in Delta Lake.
With unmanaged tables, the folder structure allows us to segregate the Gold, Silver, and Bronze layers effectively. However, with managed tables, we don't have control over the folder structure.
Is there a difference in maintaining the naming convention between Managed or Unmanaged tables, particularly in implementing the Medallion Architecture? Could you please provide insights or recommendations on how to approach this to ensure a well-structured and maintainable data engineering environment?
Your guidance on this matter would be greatly appreciated.
I have a doubt. The bronze layer always causes confusion for me. You mentioned, "File Format: Store data in Delta Lake format to leverage its performance, ACID transactions, and schema evolution capabilities" for silver layers.
Then, does this mean that is not needed to preserve the data in its original format? for instance, if this comes in JSON format from the source system or if we are exporting this data from the source database in CSV format compressed in zip files?
This part confused me, should we not store the data in its original format as per the medallion architecture? and should we only rely on the bronze layer for data history, lineage, audit, and reprocessing?
Thank you very much in advance for clarify this for me.
Hi @ramdhilip ,
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!