cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Redshift to Databricks Migration with Lakebridge

abhijit007
Databricks Partner

We are currently performing an assessment for a clientโ€™s Redshift to Databricks migration, and we would like to better understand the enhanced capabilities of Lakebridge for this use case.

We would appreciate clarification on the following points:

Scope of Lakebridge capabilities
Is Lakebridge capable of handling:
Data migration
Metadata migration
Workflow / orchestration migration

Migration lifecycle coverage
Does Lakebridge support the full migration lifecycle, including discovery, analysis, and conversion?
My earlier understanding was that it primarily supported analysis, so I would like to confirm the current capabilities.

Licensing and cost
I understand that Lakebridge is an openโ€‘source project. In its current form or recent enhancements, is there any licensing cost or commercial offering involved?

Databricks best practices
Are there any official Databricksโ€‘recommended best practices or reference architectures specifically for Redshift to Databricks migrations, especially when using Lakebridge?

Thanks in advance for your guidance and insights.
Any details or references would be greatly appreciated.

#lakebridge #redshift

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @abhijit007 

For a Redshift --> Databricks migration, Lakebridge is designed to automate the code and metadata side of the migration and help you validate results on Databricks. Lakebridge does not copy data out of Redshift itself. Data movement is typically handled via Databricks Lakeflow/native connectors/cloud data-migration tools, with Lakebridge used to profile the estate and then validate the data on Databricks once it has landed. Lakebridge can scan your Redshift SQL and objects, estimate migration effort, and convert a large portion of SQL and DDL into Databricks SQL, surfacing what needs manual review. Lakebridge also focuses on the SQL and ETL logic inside pipelines. Orchestrators (e.g., Airflow, Step Functions, schedulers) are typically re-implemented or integrated with Databricks Lakeflow Jobs as part of the overall migration plan.

Lakebridge supports most of the technical migration lifecycle such as discovery & assessment which involves Profiling + Analyzer to inventory objects, classify complexity, and estimate effort and.... conversion which involves automated SQL/ETL conversion using a combination of deterministic converters and LLM-based translation for more complex patterns. It will also help with reconciling schemas and data (row counts, aggregates, column checks) between Redshift and Databricks to build confidence before the cut-over. However, it does not replace overall project management, cut-over planning, or data-movement plumbing, but plugs into those processes.

From a licensing and cost perspective, Lakebridge is a free Databricks migration tool. There is no license cost or consumption-based fee for using it. Parts of it (like Profiler and Reconcile) are open source. The Analyzer and converters are free but not open-sourced.

For Redshift to Databricks, we generally recommend:

  • Use the Redshift to Databricks Migration Guide to structure the project into clear phases (discovery, assessment, design, migration, validation).
  • Run Lakebridge to profile and assess the Redshift estate, then use its conversion capabilities to translate as much SQL/ETL as possible and highlight what needs manual refactoring.
  • Use your preferred data-movement tooling to land data in the lakehouse, then use Lakebridge reconciliation (and optionally partner tools like Datafold) to verify that the migrated workloads are correct before cut-over.

Here are some references that may help.

Hope this helps.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

2 REPLIES 2

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @abhijit007 

For a Redshift --> Databricks migration, Lakebridge is designed to automate the code and metadata side of the migration and help you validate results on Databricks. Lakebridge does not copy data out of Redshift itself. Data movement is typically handled via Databricks Lakeflow/native connectors/cloud data-migration tools, with Lakebridge used to profile the estate and then validate the data on Databricks once it has landed. Lakebridge can scan your Redshift SQL and objects, estimate migration effort, and convert a large portion of SQL and DDL into Databricks SQL, surfacing what needs manual review. Lakebridge also focuses on the SQL and ETL logic inside pipelines. Orchestrators (e.g., Airflow, Step Functions, schedulers) are typically re-implemented or integrated with Databricks Lakeflow Jobs as part of the overall migration plan.

Lakebridge supports most of the technical migration lifecycle such as discovery & assessment which involves Profiling + Analyzer to inventory objects, classify complexity, and estimate effort and.... conversion which involves automated SQL/ETL conversion using a combination of deterministic converters and LLM-based translation for more complex patterns. It will also help with reconciling schemas and data (row counts, aggregates, column checks) between Redshift and Databricks to build confidence before the cut-over. However, it does not replace overall project management, cut-over planning, or data-movement plumbing, but plugs into those processes.

From a licensing and cost perspective, Lakebridge is a free Databricks migration tool. There is no license cost or consumption-based fee for using it. Parts of it (like Profiler and Reconcile) are open source. The Analyzer and converters are free but not open-sourced.

For Redshift to Databricks, we generally recommend:

  • Use the Redshift to Databricks Migration Guide to structure the project into clear phases (discovery, assessment, design, migration, validation).
  • Run Lakebridge to profile and assess the Redshift estate, then use its conversion capabilities to translate as much SQL/ETL as possible and highlight what needs manual refactoring.
  • Use your preferred data-movement tooling to land data in the lakehouse, then use Lakebridge reconciliation (and optionally partner tools like Datafold) to verify that the migrated workloads are correct before cut-over.

Here are some references that may help.

Hope this helps.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

pradeep_singh
Contributor III

There is a nice course on Partner Academy as well . It uses SQL Server as a target system for migration but you can follow the same steps for Redshift as well . 

https://partner-academy.databricks.com/learn/courses/4326/lakebridge-for-sql-source-system-migration

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev