cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Question on how to properly write a dataset of custom objects to MonogDB

Mathias_Peters
Contributor II

Hi, 

I am implementing a Spark Job in Kotlin (unfortunately a must-have) which reads from and writes to MongoDB. The reason for this is to reuse existing code in a MapFunction. The result of applying that map is a DataSet of type Consumer, a custom object from our code base, which is serializable using the kotlinx serializer. I have code available to serialize that Consumer into a BsonDocument. 

In my first attempt, I typed the MapFunction to return a BSonDocument and then called:

rm.write().format("mongodb").mode("append").save()

 with rm being the dataset of type BSonDocument. However, that stores the data in binary like this: 

Binary.createFromBase64('rO0ABXNyAChvcmcuYnNvbi5Cc29uRG9jdW1lbnQkU2VyaWFsaXphdGlvblByb3h5AAAAAAAAAAECAAFbAAVieXRlc3QAAltCeHB1โ€ฆ', 0)

I assume, that the DataSetWriter of MongoDB serializes the BsonDocuments again.

Is this the case?
How can I write the dataset of consumers to MongoDB and have them stored as normal documents?

Thank you

 

 

0 REPLIES 0