Hi everyone — wanted to share a three-part series I recently published on Medium that examines architectural patterns from a real Databricks-based data consolidation project.
The specific case is a logistics platform unifying two legacy systems into a denormalized order model. But the series is really about a broader question: what happens when you treat a unified data model as a single recomputable structure, what that decision implies for the pipelines maintaining it, and what foundational primitives would change the shape of the problem.
A few of the themes the chapters develop:
• A unified data model as a recomputation contract
• The structural inevitability of dual-pipeline divergence without CDC
• Why centralized state forces distributed reconstruction
• Architectural directions — including Lakeflow and Lakebase — that respond to these patterns
Part 1 — How a single SQL query became our domain model
Part 2 — Two pipelines, one model, and the drift we couldn't avoid
Part 3 — Why centralized state forces distributed reconstruction