Sidhant07
Databricks Employee
Databricks Employee

To move Delta tables from a Dev workspace to a Prod workspace, you can use a combination of Delta Lake features and Databricks APIs. Here's a high-level approach with some Python code to help you get started:

Method 1: Using CLONE command

For smaller tables or when both workspaces can access the same storage:

from delta.tables import *

# Connect to Dev workspace
dev_spark = # Initialize SparkSession for Dev workspace

# Connect to Prod workspace
prod_spark = # Initialize SparkSession for Prod workspace

def clone_table(table_name):
    # Clone the table from Dev to Prod
    dev_table = DeltaTable.forName(dev_spark, f"dev_database.{table_name}")
    dev_table.clone(
        target=f"prod_database.{table_name}",
        isShallow=False,
        replace=True
    )

# Get list of tables to clone
tables_to_clone = dev_spark.sql("SHOW TABLES IN dev_database").select("tableName").collect()

# Clone each table
for row in tables_to_clone:
    clone_table(row.tableName)

Method 2: Using external storage

For larger tables or when workspaces can't directly access each other's storage:

  1. Export data to external storage:
    def export_table(table_name, external_path):
        df = dev_spark.table(f"dev_database.{table_name}")
        df.write.format("delta").mode("overwrite").save(f"{external_path}/{table_name}")
    
    # Export all tables
    external_path = "abfss://your-container@your-storage-account.dfs.core.windows.net/export"
    tables_to_export = dev_spark.sql("SHOW TABLES IN dev_database").select("tableName").collect()
    
    for row in tables_to_export:
        export_table(row.tableName, external_path)
    1. Import data in Prod workspace:
      def import_table(table_name, external_path):
          df = prod_spark.read.format("delta").load(f"{external_path}/{table_name}")
          df.write.format("delta").mode("overwrite").saveAsTable(f"prod_database.{table_name}")
      
      # Import all tables
      for row in tables_to_export:
          import_table(row.tableName, external_path)

View solution in original post