Databricks

ArunSharma · ‎04-09-2023

Please help me for database Objects Naming Convention and coding standard for Bronze, Silver and Gold Layers

Kaniz · ‎04-11-2023

Hi @Arun Sharma, Database objects naming conventions and coding standards are crucial to maintaining consistency, readability, and manageability in a data engineering project.

In Databricks, you can use the naming conventions and coding norms for the Bronze, Silver, and Gold layers.

General Naming Conventions:

Use lowercase letters for all object names (tables, views, columns, etc.).
- Separate words with underscores for readability.
- Be descriptive and concise. Use names that indicate the purpose of the object.
- Avoid using reserved keywords or special characters.

Bronze Layer (Raw Data Layer):

Table Naming Convention: Use the prefix "bronze_" followed by the source system or data source and the object's name—for example, bronze_salesforce_opportunities.
- File Format: Store data in Delta Lake format to leverage its performance, ACID transactions, and schema evolution capabilities.
- Partitioning: Use partition columns that best suit your data access patterns, such as date or timestamp.

Silver Layer (Cleansed and Enriched Data Layer):

Table Naming Convention: Use the prefix "silver_" followed by the functional area or business domain and the object's name—for example, silver_finance_transactions.
- File Format: Use Delta Lake format for storing the data.
- Partitioning: Choose appropriate partition columns, considering data access patterns and performance implications.
- Data Cleansing and Enrichment: Apply necessary data quality checks, type conversions, and enrichment processes.
- Documentation: Document the transformation logic and any assumptions made during the cleansing and enrichment process.

Gold Layer (Aggregated and Business Ready Data Layer):

Table Naming Convention: Use the prefix "gold_" followed by the functional area or business domain and the object's name. For example, gold_sales_monthly_summary.
- File Format: Store the data in Delta Lake format.
- Partitioning: Choose appropriate partition columns, considering data access patterns and performance implications.
- Aggregations: Perform aggregations and calculations as required by the business requirements.
- Documentation: Document the aggregation logic and any assumptions made during the process.

Code Organization:

Separate your code into different notebooks for each layer (Bronze, Silver, Gold) and maintain a clear hierarchy for ease of maintenance.
- Follow best code formatting and readability practices, such as user comments, consistent indentation, and modularization.
- Use version control systems like Git to manage your codebase and track changes.

Following these naming conventions and coding standards allows you to maintain a well-structured, easily understandable, and maintainable data engineering project in Databricks.

Anonymous · ‎04-12-2023

Hi @Arun Sharma

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!

Databricks

Database Objects Naming Convention for Bronze, Silver and Gold Layers

🔔 Attention Databricks Academy Users: SSO Implementation Incoming! Secure Your Account Today!

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs