- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2023 05:50 AM - edited 12-26-2023 05:53 AM
Hi, first of all thank you all in advance! I am very interested on this topic!
My question is beyond what it is described here. As well as @Pektas , I am using debezium to send data from Postgres to a Kafka topic (in fact, Azure EventHub). My question is, what are the best practices and recommendations to save raw data and then implement a medallion architecture?
I am using Unity Catalog, but I am thinking about different implementations:
- Use a table or a volume for raw data (if it is a table, it would contain data from all tables in a database)
- Use a standard workflow or a DLT pipeline?
- Use a DLT or not?
For clarification, I want to store raw data as parquet files and then use them as cloudfiles format for CDC and bronze tables using DLT. I think this approach is good because if I need to reprocess raw data (let's say because raw data schema changed and I need to reprocess it), I feel it safe because the truth is stored in an object store. Am I right?
Thank you!