How to write ObjectId value using Spark connector 10.2.2

ask005 — Sun, 13 Jul 2025 13:46:10 GMT

In pySpark mongo connector while updating records how to handle _id as objectId.

spark 3.2.4
scala2.13
sparkMongoConnector 2.12-10.2.2

Re: How to write ObjectId value using Spark connector 10.2.2

mark_ott — Tue, 11 Nov 2025 10:54:49 GMT

To write an ObjectId value using Spark Mongo Connector 10.2.2 in PySpark while updating records, you must convert the ObjectId string into a special format. The Spark Mongo Connector does not automatically recognize a string as an ObjectId; it will otherwise store it as a string in MongoDB rather than the expected BSON ObjectId type.

Required Technique

Format the ObjectId value using a JSON structure: {"$oid": "<hex string here>"}
When creating or updating your DataFrame, convert the _id field (or any ObjectId field) into this format.
Set the Spark option: .config("spark.mongodb.write.convertJson", "object_Or_Array_Only")
- This enables the connector to convert the JSON structure to a BSON ObjectId when writing to MongoDB.

PySpark Example

python

from pyspark.sql.functions import col, struct, lit

# Example with existing DataFrame `df` with '_id' field as string
df = df.withColumn("_id", struct(lit("$oid").alias("oid"), col("_id")))

# Writing with configuration for ObjectId conversion
df.write \
  .format("mongodb") \
  .option("uri", "mongodb://host:port/database.collection") \
  .option("spark.mongodb.write.convertJson", "object_Or_Array_Only") \
  .mode("append") \
  .save()

Ensure that each ObjectId column is structured as {"$oid": "hexid"} before writing.

MongoDB Connector Version, Spark, and Scala

Compatible with Spark 3.2.4, Scala 2.13, and mongo-spark-connector_2.12-10.2.2.

Key Details

Direct string ObjectIds will be written as strings; you must use the JSON struct format above for true ObjectId behavior in MongoDB.
This method is necessary for both insertion and update operations.