jcozar
Contributor

Hi, first of all thank you all in advance! I am very interested on this topic!

My question is beyond what it is described here. As well as @Pektas , I am using debezium to send data from Postgres to a Kafka topic (in fact, Azure EventHub). My question is, what are the best practices and recommendations to save raw data and then implement a medallion architecture?

I am using Unity Catalog, but I am thinking about different implementations:

- Use a table or a volume for raw data (if it is a table, it would contain data from all tables in a database)

- Use a standard workflow or a DLT pipeline?

- Use a DLT or not?

For clarification, I want to store raw data as parquet files and then use them as cloudfiles format for CDC and bronze tables using DLT. I think this approach is good because if I need to reprocess raw data (let's say because raw data schema changed and I need to reprocess it), I feel it safe because the truth is stored in an object store. Am I right?

Thank you!