Is it not needed to preserve the data in its original format anymore with the usage of medallion?

eimis_pacheco
Contributor

Hi Community 

I have a doubt. The bronze layer always causes confusion for me. Someone mentioned, "File Format: Store data in Delta Lake format to leverage its performance, ACID transactions, and schema evolution capabilities" for bronze layers.

Then, does this mean that is not needed to preserve the data in its original format? for instance, if this comes in JSON format from the source system or if we are exporting this data from the source database in CSV format compressed in zip files?

This part confused me, should we not store the data in its original format as per the medallion architecture? and should we only rely on the bronze layer for data history, lineage, audit, and reprocessing?

Thank you very much in advance for clarifying this for me.

Best Regards

#medallionarchitecture #

Hi @Retired_mod 

Just a last question, what would happen if someone decided to change the name of one columns in the source system? For example, if someone renames the column "ID" for "cust_id" in the customer table? how Delta Lake format now will know that the values in the "cust_id" column are referencing the same values as in the "ID" column considering this statement "while adding additional features such as versioning, schema enforcement, etc.

Thank you once more time for your valuable insight.

Regards

#medallionarchitecture 

View solution in original post

Thank you very much for your answers and insights @Retired_mod

Regards!