MongoDB to databricks driver killed and compute re-attached
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-09-2026 08:14 PM
I started reading the data from the mongodb using the spark read it uses mongo-spark-connector, by default there will be sample size as 1000 meaning referring only 1000 documents in the collection to make them as columns in the dataframe, so i increased size to the number of documents in the collection here in my case the document has 100+ keys.
Compute used: Legacy compute
Code:
df = spark.read \
.format("mongodb") \
.option("spark.mongodb.connection.uri", mongo_url) \
.option("database", database) \
.option("collection", collection) \
.option("mergeSchema", "true")\
.option("partitioner", "MongoShardedPartitioner") \
.option("partitionerOptions.shardKey", "_id") \
.option("sampleSize", "100000")\
.load()
Error:
"The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.
at com.databricks.spark.chauffeur.Chauffeur.onDriverStateChange(Chauffeur.scala:2035)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)"
Labels:
- Labels:
-
Spark