cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Migrate on-premise delta tables to Databricks (Azure)

tonkol
New Contributor II

Hi There,

I have the situation that we've decided to migrate our on-premise delta-lake to Azure Databricks.

Because of networking I can only "push" the data from on-prem to cloud.

What would be the best way to replicate all tables: schema+partitioning info and all data?

I wanted to just get the create table statements and than insert the data, but I've hit this open item... 😞

https://github.com/delta-io/delta/issues/1032

Thanks for your suggestions

1 REPLY 1

mukul1409
New Contributor II

The correct solution is not SQL based.

Delta tables are defined by the contents of the delta log directory, not by CREATE TABLE statements. That is why SHOW CREATE TABLE cannot reconstruct partitions, properties or constraints.

The only reliable migration method in a push-only network setup is to replicate the raw Delta folders including the delta log directory.

RECOMMENDED MIGRATION PATTERN (PUSH ONLY)

  1. On-prem: copy full Delta folders to ADLS Gen2

Example logic in Spark:

  • List tables in the database

  • For each table, read its storage location using DESCRIBE DETAIL

  • Recursively copy the entire table folder to ADLS, including the delta log folder and parquet files

This copies

  • delta log files

  • all parquet data files

  • schema, partitioning, constraints, table properties

  • delta version history

  1. Azure Databricks: register tables

Create the schema if missing, then create tables using:

CREATE TABLE schema.table
USING DELTA
LOCATION 'abfss path to the copied folder'

Do NOT use CTAS, INSERT, or recreate DDL.

The table is now identical to the on-prem table.

OPTIONAL INCREMENTAL REPLICATION

Use Delta Change Data Feed on-prem to push only changes.

Read the change feed starting from the last migrated version and write it to an ADLS change folder.

On Azure Databricks, MERGE the change data into the target Delta table using the primary key and change type.

WHY SQL BASED MIGRATION FAILS

SHOW CREATE TABLE

  • misses partitions, properties, constraints

CTAS or INSERT

  • loses metadata and history

Copying the delta log directory

  • results in a perfect table clone


Mukul Chauhan