Nghiaht1
New Contributor III

This is how I can config to run PySpark (scala 2.12 Spark 3.2.1) Structure Streaming with Kafka on jupyter lab (need to download 2 jars file spark-sql-kafka-0-10_2.12-3.2.1.jar, kafka-clients-2.1.1.jar to folder jars)

spark = SparkSession\

  .builder\

  .config("spark.jars", os.getcwd() + "/jars/spark-sql-kafka-0-10_2.12-3.2.1.jar" + "," + os.getcwd() + "/jars/kafka-clients-2.1.1.jar") \

  .appName("Structured_Redpanda_WordCount")\

  .getOrCreate()

spark.conf.set("spark.sql.shuffle.partitions", 1)