cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Maintain the camelCase column names in the bronze layer, or is it advisable to rename column names

param_sen
New Contributor II

I am utilizing the Databricks autoloader to ingest files from Google Cloud Storage (GCS) into Delta tables in the bronze layer of a Medallion architecture. According to lakehouse principles, the bronze layer should store raw data

 

Hi dear community,

I am utilizing the Databricks autoloader to ingest files from Google Cloud Storage (GCS) into Delta tables in the bronze layer of a Medallion architecture. According to lakehouse principles, the bronze layer should store raw data with minimal transformation. I have a scenario where the incoming file has columns in camelCase, but the corresponding table in the silver layer uses snake_case for column names. Should I maintain the camelCase column names in the bronze layer, or is it advisable to rename them to align with the snake_case convention in the silver layer?

 

Here are some considerations:

  1. Consistency: Maintaining consistency in naming conventions across layers can make it easier for teams to work with the data. If your silver layer uses snake_case, you might choose to rename the columns in the bronze layer for consistency.

  2. Downstream Processing: If downstream processes or tools expect a specific naming convention, it may be more convenient to align the bronze layer with those expectations.

  3. Documentation: If you decide to keep the camelCase names in the bronze layer, make sure to document this decision clearly. This documentation should be easily accessible to anyone who works with or analyzes the data.

  4. Transformation at Silver Layer: If your data transformation processes are well-defined and centralized, you might prefer to handle the column renaming during the transformation from bronze to silver. This way, the bronze layer remains a true representation of the raw data, and transformations are applied as needed in subsequent layers.

So I am asking for opinion/suggestions/best practices from this community as I am new in this . Looking forward to your support.

 

Regards,

param_sen

1 REPLY 1

Dribka
New Contributor III

Hey @param_sen ,

Navigating the nuances of naming conventions, especially when dealing with different layers in a lakehouse architecture, can be a bit of a puzzle. Your considerations are on point. If consistency across layers is a priority and downstream processes or tools are accustomed to snake_case, renaming the columns in the bronze layer might streamline things. Documenting this decision is key, ensuring anyone interacting with the data is on the same page. On the flip side, if the camelCase in the bronze layer aligns with raw data principles and you have well-defined transformation processes in the silver layer, handling the renaming there could maintain the integrity of the raw data. It boils down to balancing consistency, downstream expectations, and the principles of each layer. Best practices can vary, so it might be worth exploring what feels most natural for your specific use case. Cheers!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.