I am trying to load data from MongoDB into Spark. I am using the Community/Free version of DataBricks so my Jupiter Notebook is in a Chrome browser.
Here is my code:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.config("spark.mongodb.read.connection.uri", uri) \
.config("spark.mongodb.output.uri", uri) \
.config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.12:10.1.1") \
.getOrCreate()
database = db
collection = tweets
df = spark.read.format("mongodb") \
.option("uri", uri) \
.option("database", database) \
.option("collection", collection) \
.load()
This is the error:
df.display()
[DATA_SOURCE_NOT_FOUND] Failed to find the data source: mongodb. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version. SQLSTATE: 42K02
[DATA_SOURCE_NOT_FOUND] Failed to find the data source: mongodb. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version. SQLSTATE: 42K02
This project is for a class so please, kindly treat me as a novice. The data is in the correct MongoDB collection, my uri and all other variables are correct and the MongoDB connection/deployment pinged successfully. I am willing to provide any necessary information. I have spent over three hours trying to fix this.
Please help me, thank you.