Unable to stream from google pub/sub
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2022 03:24 PM
I am trying to run below for subscribing to a pubsub but this code is throwing this exception
java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/DataSourceV2
I have tried using all versions of https://mvnrepository.com/artifact/com.google.cloud/pubsublite-spark-sql-streaming no luck so far.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Simple Pub/Sub Lite Read').getOrCreate()
df = spark.readStream \
.format("pubsublite") \
.option("pubsublite.subscription", "My subscription path") \
.option("gcp.credentials.key", "my gcp credential").load()
df.show(10, False)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-19-2022 12:23 AM
can you retry without creating a sparksession? As databricks provides one for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-01-2022 08:37 PM
Hi @cloud user As of now, we do not have structured streaming support with Pub/Sub. Below are the supported sources with structured streaming:
https://docs.gcp.databricks.com/spark/latest/structured-streaming/data-sources.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-17-2023 10:02 PM
Hi @210573
Databricks now start supporting pub/sub streaming natively now you can start using pubsub streaming for your use case for more info visit below official URL -
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-09-2025 12:12 AM
I see some issues from using pubsub as source.
in the writeStream, both .foreach or .foreachBatch cannot work to be called when stream data arrives

