Re: DLT or DataBricks for CDC and NRT

mark_ott · ‎10-01-2025

For cost-sensitive, large-scale healthcare data streaming scenarios, using Delta Live Tables (DLT) for both CDC and streaming (Option C) is generally the most scalable, manageable, and cost-optimized approach. DLT offers native support for structured batch CDC and high-throughput streaming ingestion, plus robust autoscaling and simplified operations compared to traditional notebook-driven architectures.

Evaluation of Each Option

Option A: Databricks DBR Notebooks for CDC & Streaming

Pros: Consistent codebase and zero migration effort.
Cons: Notebooks lack the fine-grained autoscaling and operational abstraction of DLT. Scale-in is often less aggressive, leading to higher steady-state costs. Notebooks require more manual orchestration and monitoring, which can increase operational complexity over time.

Option B: DBR for CDC, DLT for Streaming

Transition Effort: Migrating CDC logic from DBR notebooks to DLT requires refactoring pipeline code—largely syntactic changes, replacing notebook-oriented code (e.g., direct Spark DataFrame operations) with declarative DLT transformations.
- Tools & Best Practices: Databricks provides documentation on migration from Spark notebooks to DLT pipelines, covering code adjustments, testing strategies, and deployment processes. However, there is no fully automated refactoring tool; migration is a semi-manual, guided process.
- Effort Estimation: For a typical CDC pipeline, expect 2-4 weeks of hands-on effort for initial migration, integration testing, and validation within the same workspace. Complexity increases with custom logic, external dependencies, or highly individualized notebook constructs.

Option C: DLT for CDC & Streaming

DLT Capabilities: DLT natively handles both batch (historical and periodic CDC) and streaming ingestion. It scales to thousands of events per second per topic with built-in reliability, idempotency, and schema enforcement.
Cost Optimization: DLT’s autoscaling (especially the Enhanced Autoscaling feature) can aggressively scale down resources during low-volume periods, unlike notebook jobs which tend to reserve clusters. DLT also reduces cloud compute footprint by orchestrating resources more efficiently, resulting in lower long-term costs.
- Technical Cost Demonstration: DLT provides real-time metrics (CPU, memory, costs per event/operation) and autoscaling history. Running a proof-of-value with identical workloads on both DBR notebooks and DLT pipelines can surface quantifiable cost differences. Many organizations observe 15–30% lower steady-state costs with DLT due to automatic scale-in and resource pooling.

Recommendations

Recommended Option: Option C (DLT for both CDC and Streaming) is optimal for your scenario, given the performance needs, cost sensitivity, and desired operational simplicity.
- DLT is designed for seamless unified batch and streaming workflows, and at your scale, the operational savings typically outweigh initial migration effort.
- To convince cost-sensitive stakeholders, implement a short-term POC where you benchmark identical workloads on both approaches and collect operational cost data.
Migration Effort (if choosing Option B or transitioning to C):
- Use Databricks’ official migration guides and allocate 2–4 weeks for CDC pipeline refactoring, integration, and acceptance tests.
- Engage Databricks solution architects for advanced optimization and troubleshooting.

View solution in original post