- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-02-2024 03:41 AM
Hi,
I am using debezium server to send data from Postgres to a Kafka topic (in fact, Azure EventHub). My question is, what are the best practices and recommendations to save raw data and then implement a medallion architecture?
For clarification, I want to store raw data as delta format and then use them as cloudfiles format for CDC and bronze tables using DLT. I think this approach is good because if I need to reprocess raw data (let's say because raw data schema changed and I need to reprocess it), I feel it is safe because the truth is stored in an object store.
I am using Unity Catalog, but I am thinking about different implementations:
- Where to store? External location, volume or Unity Catalog table?
- Use a standard workflow or a DLT pipeline?
Am I facing this problem right?
Thank you in advance!