Hi,
I am implementing a Spark Job in Kotlin (unfortunately a must-have) which reads from and writes to MongoDB. The reason for this is to reuse existing code in a MapFunction. The result of applying that map is a DataSet of type Consumer, a custom object from our code base, which is serializable using the kotlinx serializer. I have code available to serialize that Consumer into a BsonDocument.
In my first attempt, I typed the MapFunction to return a BSonDocument and then called:
rm.write().format("mongodb").mode("append").save() with rm being the dataset of type BSonDocument. However, that stores the data in binary like this:
Binary.createFromBase64('rO0ABXNyAChvcmcuYnNvbi5Cc29uRG9jdW1lbnQkU2VyaWFsaXphdGlvblByb3h5AAAAAAAAAAECAAFbAAVieXRlc3QAAltCeHB1โฆ', 0)I assume, that the DataSetWriter of MongoDB serializes the BsonDocuments again.
Is this the case?
How can I write the dataset of consumers to MongoDB and have them stored as normal documents?
Thank you