nayan_wylde
Esteemed Contributor II

To write a Delta table to MongoDB, you'll need to:

  1. Read the Delta table using PySpark or Pandas.
  2. Convert the data into a format MongoDB can accept (typically JSON or a dictionary).
  3. Use a MongoDB client (like pymongo) to insert the data.

Sample code: 

from pyspark.sql import SparkSession
from pymongo import MongoClient

# Step 1: Initialize Spark session
spark = SparkSession.builder \
    .appName("DeltaToMongo") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .getOrCreate()

# Step 2: Read Delta table
delta_df = spark.read.format("delta").load("/path/to/delta/table")

# Step 3: Convert to Pandas DataFrame
pandas_df = delta_df.toPandas()

# Step 4: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["your_database"]
collection = db["your_collection"]

# Step 5: Insert data into MongoDB
collection.insert_many(pandas_df.to_dict("records"))