cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it not needed to preserve the data in its original format anymore with the usage of medallion?

eimis_pacheco
Contributor

Hi Community 

I have a doubt. The bronze layer always causes confusion for me. Someone mentioned, "File Format: Store data in Delta Lake format to leverage its performance, ACID transactions, and schema evolution capabilities" for bronze layers.

Then, does this mean that is not needed to preserve the data in its original format? for instance, if this comes in JSON format from the source system or if we are exporting this data from the source database in CSV format compressed in zip files?

This part confused me, should we not store the data in its original format as per the medallion architecture? and should we only rely on the bronze layer for data history, lineage, audit, and reprocessing?

Thank you very much in advance for clarifying this for me.

Best Regards

#medallionarchitecture #

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @Retired_mod 

Just a last question, what would happen if someone decided to change the name of one columns in the source system? For example, if someone renames the column "ID" for "cust_id" in the customer table? how Delta Lake format now will know that the values in the "cust_id" column are referencing the same values as in the "ID" column considering this statement "while adding additional features such as versioning, schema enforcement, etc.

Thank you once more time for your valuable insight.

Regards

#medallionarchitecture 

View solution in original post

2 REPLIES 2

Hi @Retired_mod 

Just a last question, what would happen if someone decided to change the name of one columns in the source system? For example, if someone renames the column "ID" for "cust_id" in the customer table? how Delta Lake format now will know that the values in the "cust_id" column are referencing the same values as in the "ID" column considering this statement "while adding additional features such as versioning, schema enforcement, etc.

Thank you once more time for your valuable insight.

Regards

#medallionarchitecture 

Thank you very much for your answers and insights @Retired_mod

Regards!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group