cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Migrating From Azure to Databricks

Pratikmsbsvm
New Contributor III

Hi Techie,

May someone please help me with Pros and Cons from migrating my Realtime streaming solution from Azure to Databricks.

which component I can replaced with Databricks and what benefit I can get out of it.

Current Architecture:-

HLD.png Many Thanks 

1 ACCEPTED SOLUTION

Accepted Solutions

lingareddy_Alva
Honored Contributor II

Hi @Pratikmsbsvm 

Looking at your current Azure streaming architecture, I can help you understand the pros and cons of migrating to Databricks. Let me break this down by component and overall considerations:

Components That Can Be Replaced with Databricks
Azure Stream Analytics โ†’ Databricks Structured Streaming
- What changes: Replace ASA with Spark Structured Streaming in Databricks
- Benefits: More flexible transformations, custom logic, ML integration, better debugging
- Considerations: Requires more development effort, need Spark expertis

Azure Data Lake Gen2 โ†’ Databricks Delta Lake:
- What changes: Use Delta Lake format on your existing ADLS Gen2 storage
- Benefits: ACID transactions, time travel, schema evolution, better performance
- Considerations: Delta Lake works great with ADLS Gen2, minimal migration needed

Azure SQL Database โ†’ Databricks SQL + Delta Lake
- What changes: Move analytical workloads to Delta Lake, keep transactional data in SQL DB
- Benefits: Better performance for analytics, unified data platform
- Considerations: May still need SQL DB for transactional systems


Pros of Migration to Databricks
Technical Benefits

Unified Platform: Single platform for streaming, batch, ML, and analytics
Advanced Analytics:
- Built-in ML capabilities, easy model deployment
- Better Performance: Optimized Spark engine, Delta Lake optimizations
- Flexibility: Custom transformations, complex event processing
- Scalability: Auto-scaling clusters, better resource utilization

Operational Benefits
- Simplified Architecture: Fewer moving parts, unified monitoring
- Cost Optimization: Pay-per-use model, automatic cluster termination
- Developer Productivity: Notebooks, collaborative environment, version control
- Data Governance: Unity Catalog for centralized metadata and security

Cons of Migration to Databricks
Complexity & Skills
- Learning Curve: Team needs Spark/Python/Scala expertise
- Development Overhead: More complex than drag-and-drop ASA
- Debugging: Streaming jobs can be harder to troubleshoot

Operational Challenges
- Monitoring: Need to set up comprehensive monitoring for Spark jobs
- Latency: May have slightly higher latency than ASA for simple transformations
- Maintenance: More infrastructure to manage and tune

Cost Considerations
- Compute Costs: Can be higher if not properly optimized
- Learning Investment: Time and training costs for team upskilling.

Migration Strategy Recommendations:
Hybrid Approach (Recommended)
1. Keep: Event Hubs, ADLS Gen2, existing applications
2. Replace: Stream Analytics with Databricks Structured Streaming
3. Enhance: Add Delta Lake format, ML capabilities
4. Gradual: Migrate one pipeline at a time

Components to Retain
- Azure Event Hubs: Excellent integration with Databricks
- ADLS Gen2: Works perfectly with Databricks Delta Lake
- Power BI: Native integration with Databricks SQL
- Existing Applications: Can connect to Databricks via JDBC/REST APIs

When Migration Makes Sense
Migrate if you need:
- Complex transformations or custom business logic
- Real-time ML inference
- Advanced analytics capabilities
- Better cost optimization for large-scale processing
- Unified platform for multiple data workloads

Stay with current setup if:
- Simple aggregations and transformations are sufficient
- Team lacks Spark expertise and timeline is tight
- Current solution meets all performance requirements
- Minimal budget for platform changes.

 

 

LR

View solution in original post

5 REPLIES 5

lingareddy_Alva
Honored Contributor II

Hi @Pratikmsbsvm 

Looking at your current Azure streaming architecture, I can help you understand the pros and cons of migrating to Databricks. Let me break this down by component and overall considerations:

Components That Can Be Replaced with Databricks
Azure Stream Analytics โ†’ Databricks Structured Streaming
- What changes: Replace ASA with Spark Structured Streaming in Databricks
- Benefits: More flexible transformations, custom logic, ML integration, better debugging
- Considerations: Requires more development effort, need Spark expertis

Azure Data Lake Gen2 โ†’ Databricks Delta Lake:
- What changes: Use Delta Lake format on your existing ADLS Gen2 storage
- Benefits: ACID transactions, time travel, schema evolution, better performance
- Considerations: Delta Lake works great with ADLS Gen2, minimal migration needed

Azure SQL Database โ†’ Databricks SQL + Delta Lake
- What changes: Move analytical workloads to Delta Lake, keep transactional data in SQL DB
- Benefits: Better performance for analytics, unified data platform
- Considerations: May still need SQL DB for transactional systems


Pros of Migration to Databricks
Technical Benefits

Unified Platform: Single platform for streaming, batch, ML, and analytics
Advanced Analytics:
- Built-in ML capabilities, easy model deployment
- Better Performance: Optimized Spark engine, Delta Lake optimizations
- Flexibility: Custom transformations, complex event processing
- Scalability: Auto-scaling clusters, better resource utilization

Operational Benefits
- Simplified Architecture: Fewer moving parts, unified monitoring
- Cost Optimization: Pay-per-use model, automatic cluster termination
- Developer Productivity: Notebooks, collaborative environment, version control
- Data Governance: Unity Catalog for centralized metadata and security

Cons of Migration to Databricks
Complexity & Skills
- Learning Curve: Team needs Spark/Python/Scala expertise
- Development Overhead: More complex than drag-and-drop ASA
- Debugging: Streaming jobs can be harder to troubleshoot

Operational Challenges
- Monitoring: Need to set up comprehensive monitoring for Spark jobs
- Latency: May have slightly higher latency than ASA for simple transformations
- Maintenance: More infrastructure to manage and tune

Cost Considerations
- Compute Costs: Can be higher if not properly optimized
- Learning Investment: Time and training costs for team upskilling.

Migration Strategy Recommendations:
Hybrid Approach (Recommended)
1. Keep: Event Hubs, ADLS Gen2, existing applications
2. Replace: Stream Analytics with Databricks Structured Streaming
3. Enhance: Add Delta Lake format, ML capabilities
4. Gradual: Migrate one pipeline at a time

Components to Retain
- Azure Event Hubs: Excellent integration with Databricks
- ADLS Gen2: Works perfectly with Databricks Delta Lake
- Power BI: Native integration with Databricks SQL
- Existing Applications: Can connect to Databricks via JDBC/REST APIs

When Migration Makes Sense
Migrate if you need:
- Complex transformations or custom business logic
- Real-time ML inference
- Advanced analytics capabilities
- Better cost optimization for large-scale processing
- Unified platform for multiple data workloads

Stay with current setup if:
- Simple aggregations and transformations are sufficient
- Team lacks Spark expertise and timeline is tight
- Current solution meets all performance requirements
- Minimal budget for platform changes.

 

 

LR

Thanks LRALVA. May you please help me with cost part. I am not have Prod level costing knowledge. Thanks.

Hi @Pratikmsbsvm 

Stream Analytics has no upfront costs - you only pay for the streaming
units you consume with no commitment or cluster provisioning required.

Databricks Cost Structure
Two-Layer Pricing Model
1. Azure VM Compute Costs (what you pay Azure)
2. Databricks Units (DBUs) (what you pay Databricks)

Typical Costs for Streaming Workloads
Small to Medium Streaming Job:
- VM Costs: $200-500/month (Standard_DS3_v2 cluster)
- DBU Costs: $300-800/month (depending on tier and usage)
- Total: $500-1,300/month

Large Streaming Job:
- VM Costs: $800-2,000/month (larger clusters)
- DBU Costs: $1,000-3,000/month
- Total: $1,800-5,000/month

Cost Optimization Strategies for Databricks
1. Cluster Optimization

# Use spot instances (60-90% cost savings)
"azure_attributes.availability": "SPOT_WITH_FALLBACK_AZURE"

# Auto-termination to avoid idle costs
"autotermination_minutes": 30

# Right-size clusters based on workload
"autoscale": {"min_workers": 2, "max_workers": 8}

2. Workload Optimization
- Batch vs Streaming: Use batch processing where real-time isn't critical
- Resource Pooling: Share clusters across multiple workloads
- Delta Lake: Reduce storage costs with compression and optimization
3. Pricing Tier Selection
- Standard: For basic streaming workloads
- Premium: Only if you need advanced security/governance
- Consider Reserved Instances: For predictable workloads

Break-Even Analysis:
When Databricks Becomes Cost-Effective:
You'll likely save money with Databricks if:
- You're running 15+ streaming units in ASA
- You need complex transformations (reducing development time)
- You're already planning ML/advanced analytics initiatives
- You can consolidate multiple ASA jobs into shared Databricks clusters

Recommendation
Start Small: Begin with a pilot migration of your most complex streaming job to Databricks while keeping simple aggregations in ASA.
This hybrid approach lets you:
- Compare actual costs vs projections
- Build team expertise gradually
- Minimize migration risk
- Optimize for the best cost/benefit ratio per workload

 

 

 

 

 

LR

vaibhavs120
Contributor

I completely agree with @lingareddy_Alva on the costing part. One small point I would like to mention is We should only enable SPOT instances (60-90% cost savings) in Development/non-critical(PROD) environment. This option works great and is indeed cost effective but not good for mission critical workloads. I used this for one of my daily load and sometimes the process terminates abruptly. @lingareddy_Alva please correct me if I am wrong here.

Vaibhav Sharma
Databricks Certified Professional
Microsoft Azure Certified Professional
Microsoft Certified Trainer

I agree with you @vaibhavs120 , thanks for bringing this up.

LR

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now