This article wraps up a technical deep dive into building large-scale Lakehouse architectures, revisiting design decisions from a 2019 platform that processed billions of payment records.
In the original platform, streaming pipelines ran on Spark Streaming while batch workflows were managed by Control-M. The split worked, but created a fragmented operational model with no unified view of pipelines or dependencies. Grouping jobs onto shared clusters reduced costs but introduced new trade-offs: loss of isolation, application-level retry logic, and reduced observability.
A real chargeback processing scenario illustrates how these challenges compounded — streaming jobs, batch file arrivals, and external partner dependencies all had to be monitored across independent systems, making root-cause analysis slow.
Modern capabilities like Lakeflow show how orchestration can be absorbed into the platform itself: unified batch and streaming, platform-native retries, task-level isolation, and integrated observability.
Key takeaways:
- Unified orchestration reduces system fragmentation
- Platform-native retries beat ad-hoc application logic
- End-to-end observability shortens recovery time
- Task-level isolation prevents shared-cluster failure cascades
This part also closes the series, bringing ingestion, SCD processing, governance, and orchestration together into a single view of a modern Lakehouse with ZeroBus, Lakeflow, and Unity Catalog.
🔗 Full article: https://medium.com/@wesley.felipe/databricks-lakehouse-without-the-workarounds-part-4-orchestration-...