After reviewing a surprising number of Databricks discussions around SCD2, CDC, historical reporting and temporal joins, I noticed that most historical data modeling challenges seem to fall into a small set of recurring patterns:
- Historical Backfill
- Late Arriving Dimension
- Early Arriving Fact
- Snapshot Reproducibility
- Historical Match Ambiguity
- Historical State Consolidation
What's interesting is that the implementation details differ, but the underlying modeling problems often look very similar.
Am I missing any major historical modeling patterns?
Curious how others would categorize these problems.