To move Delta tables from a Dev workspace to a Prod workspace, you can use a combination of Delta Lake features and Databricks APIs. Here's a high-level approach with some Python code to help you get started:
Method 1: Using CLONE command
For smaller tables or when both workspaces can access the same storage:
from delta.tables import *
# Connect to Dev workspace
dev_spark = # Initialize SparkSession for Dev workspace
# Connect to Prod workspace
prod_spark = # Initialize SparkSession for Prod workspace
def clone_table(table_name):
# Clone the table from Dev to Prod
dev_table = DeltaTable.forName(dev_spark, f"dev_database.{table_name}")
dev_table.clone(
target=f"prod_database.{table_name}",
isShallow=False,
replace=True
)
# Get list of tables to clone
tables_to_clone = dev_spark.sql("SHOW TABLES IN dev_database").select("tableName").collect()
# Clone each table
for row in tables_to_clone:
clone_table(row.tableName)
Method 2: Using external storage
For larger tables or when workspaces can't directly access each other's storage:
- Export data to external storage:
def export_table(table_name, external_path):
df = dev_spark.table(f"dev_database.{table_name}")
df.write.format("delta").mode("overwrite").save(f"{external_path}/{table_name}")
# Export all tables
external_path = "abfss://your-container@your-storage-account.dfs.core.windows.net/export"
tables_to_export = dev_spark.sql("SHOW TABLES IN dev_database").select("tableName").collect()
for row in tables_to_export:
export_table(row.tableName, external_path)
- Import data in Prod workspace:
def import_table(table_name, external_path):
df = prod_spark.read.format("delta").load(f"{external_path}/{table_name}")
df.write.format("delta").mode("overwrite").saveAsTable(f"prod_database.{table_name}")
# Import all tables
for row in tables_to_export:
import_table(row.tableName, external_path)