To migrate tables and views from Snowflake (source) to Databricks (target) using Lakebridge, you must export your data from Snowflake into a supported cloud storage (usually as Parquet files), then import these files into Databricks Delta tables. Lakebridge simplifies the overall process—especially for code, schema conversion, and reconciliation—but the physical migration of large volumes of data is performed via staging to cloud storage and loading into Databricks.
Step-by-Step Data Migration Process
1. Export Data from Snowflake
-
Use the Snowflake COPY INTO command to export your tables as Parquet files to a cloud storage bucket (S3, ADLS, or GCS). Example:
COPY INTO 's3://your-bucket/path/'
FROM my_database.my_schema.my_table
FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY);
-
For large tables, use partitioning for export efficiency (e.g., partition by date column).
2. Set Up Cloud Storage Access in Databricks
-
Ensure Databricks has access to your storage location by configuring the necessary credentials and permissions.
3. Load Data into Databricks Delta Tables
-
Use Databricks Notebooks or workflows to create Delta tables and load data from Parquet files:
df = spark.read.format("parquet").load("s3://your-bucket/path/")
df.write.format("delta").save("/mnt/delta/target_table")
-
For continuous/streaming loads, use Auto Loader or Databricks workflows for incremental or live updates.
4. Migrate and Create Views/SQL Logic
-
After data migration, convert your Snowflake views and SQL queries using Lakebridge’s Converter. Validate and translate SQL, and deploy scripts in Databricks as new views.
5. Reconciliation Preparation
-
Once data and views are migrated, use Lakebridge’s reconciliation tools to compare row counts, aggregates, and schemas between Snowflake and Databricks to ensure fidelity.
Key Reminders
-
For metadata (schema, DDLs), leverage Lakebridge Analyzer and Converter.
-
For data movement, use Parquet via cloud storage as the most compatible path.
-
Automation: For many tables, script the process or employ Databricks batch jobs for efficiency.
-
After migration, update your BI or analytics tools to point to Databricks tables/views.
This approach enables a robust, auditable pipeline that supports reconciliation and validation for accurate migration outcomes, which is vital before advancing to the reconciliation phase.