05-18-2022 03:24 PM
I am trying to run below for subscribing to a pubsub but this code is throwing this exception
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/DataSourceV2
I have tried using all versions of https://mvnrepository.com/artifact/com.google.cloud/pubsublite-spark-sql-streaming no luck so far.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Simple Pub/Sub Lite Read').getOrCreate()
df = spark.readStream \
.format("pubsublite") \
.option("pubsublite.subscription", "My subscription path") \
.option("gcp.credentials.key", "my gcp credential").load()
df.show(10, False)
05-19-2022 12:23 AM
can you retry without creating a sparksession? As databricks provides one for you.
06-01-2022 08:37 PM
Hi @cloud user As of now, we do not have structured streaming support with Pub/Sub. Below are the supported sources with structured streaming:
https://docs.gcp.databricks.com/spark/latest/structured-streaming/data-sources.html
08-17-2023 10:02 PM
Hi @210573
Databricks now start supporting pub/sub streaming natively now you can start using pubsub streaming for your use case for more info visit below official URL -
02-09-2025 12:12 AM
I see some issues from using pubsub as source.
in the writeStream, both .foreach or .foreachBatch cannot work to be called when stream data arrives
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now