AWS SageMaker to the Azure Databricks.

thewfhengineer — Tue, 18 Nov 2025 13:47:04 GMT

I'm starting a project to migrate our Compliance model - Python code (Pandas-based) from AWS SageMaker to the Azure ecosystem.

Source: AWS (SageMaker, Airflow)
Target: Azure (Databricks, ADLS)

I'm evaluating the high-level approach and would appreciate your guidance. The core options I'm considering are:

Lift & Shift: Minimal code changes, focusing on getting it running on Databricks quickly.
Refactor & Modernize: Adapting the code to leverage native Azure/Databricks capabilities (like Spark) for better long-term performance.

What are your thoughts on the best path forward?

Can you share your experiance ?

#sagemaker #aws #migration

Re: AWS SageMaker to the Azure Databricks.

mark_ott — Wed, 19 Nov 2025 11:47:23 GMT

For migrating a Python/Pandas-based compliance model from AWS SageMaker/Airflow to Azure Databricks/ADLS, the best approach depends on priorities like speed, risk, cost, and future scalability. Both "Lift & Shift" and "Refactor & Modernize" have clear trade-offs that should be evaluated in the context of your business needs and long-term goals.

Lift & Shift

Pros
- Fastest migration: You can quickly get your existing code running with minimal changes, allowing for rapid testing and deployment on Azure Databricks.
- Lower immediate risk and effort: Familiarity with the existing codebase makes troubleshooting easier.
- Simplified rollback: If issues arise, you can revert to AWS more easily since core functionality remains unchanged.
Cons
- Limited scalability: Pandas operations are in-memory, which doesn't leverage Spark's distributed computing.
- Higher long-term operational costs: Databricks pricing is better optimized for Spark workloads, not for Pandas jobs.
- Missed Azure-native optimizations: You won’t benefit from performance improvements and integrations available with Spark, ADLS, or MLflow.

Refactor & Modernize

Pros
- Performance boost: Migrating Pandas code to PySpark or Koalas unlocks better scalability for large datasets through distributed processing.
- Enhanced maintainability: Aligning with Databricks best practices and Azure-native features supports future-proofing and easier integration with ADLS, Delta Lake, and MLflow.
- Cost-effectiveness at scale: More efficient resource utilization on Databricks and easier management of data pipelines.
Cons
- Increased migration time and complexity: Significant code rewrites, team upskilling, and validation effort required.
- Higher upfront investment: More design, testing, and documentation compared to Lift & Shift, delaying deployment and ROI.

Recommendation

If your goal is rapid migration and validation, Lift & Shift is reasonable for initial deployment, especially if you’re constrained by time or resources. However, for sustainable operations and scalability, Refactor & Modernize offers better long-term value by fully leveraging Azure Databricks and ADLS capabilities. Many organizations blend both strategies: start with Lift & Shift for quick wins, then iteratively refactor critical components to Spark/Delta Lake as business needs grow.

Key Advice

Assess data size, workflow complexity, and compliance requirements before deciding.
Identify high-value pipelines or bottlenecks for targeted modernization.
Plan for change management, team training, and thorough testing.
Document the migration approach for auditability and future optimization.