MongoDB to databricks driver killed and compute re-attached

DoredlaCharan
New Contributor III

I started reading the data from the mongodb using the spark read it uses mongo-spark-connector, by default there will be sample size as 1000 meaning referring only 1000 documents in the collection to make them as columns in the dataframe, so i increased size to the number of documents in the collection here in my case the document has 100+ keys.

Compute used: Legacy compute

Code:

df = spark.read \
    .format("mongodb") \
    .option("spark.mongodb.connection.uri", mongo_url) \
    .option("database", database) \
    .option("collection", collection) \
    .option("mergeSchema", "true")\
    .option("partitioner", "MongoShardedPartitioner") \
    .option("partitionerOptions.shardKey", "_id") \
    .option("sampleSize", "100000")\
    .load()

Error:

"The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.
	at com.databricks.spark.chauffeur.Chauffeur.onDriverStateChange(Chauffeur.scala:2035)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)"