Unable to stream from google pub/sub

210573
New Contributor

I am trying to run below for subscribing to a pubsub but this code is throwing this exception

java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/DataSourceV2

I have tried using all versions of https://mvnrepository.com/artifact/com.google.cloud/pubsublite-spark-sql-streaming no luck so far.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('Simple Pub/Sub Lite Read').getOrCreate()

df = spark.readStream \

 .format("pubsublite") \

 .option("pubsublite.subscription", "My subscription path") \

 .option("gcp.credentials.key", "my gcp credential").load()

df.show(10, False)

-werners-
Esteemed Contributor III

can you retry without creating a sparksession? As databricks provides one for you.

Noopur_Nigam
Databricks Employee
Databricks Employee

Hi @cloud user​ As of now, we do not have structured streaming support with Pub/Sub. Below are the supported sources with structured streaming:

https://docs.gcp.databricks.com/spark/latest/structured-streaming/data-sources.html

Ajay-Pandey
Databricks MVP

Hi @210573 

Databricks now start supporting pub/sub streaming natively now you can start using pubsub streaming for your use case for more info visit below official URL -

PUB/SUB with Databricks 

Ajay Kumar Pandey

davidkhala-ms
New Contributor II

I see some issues from using pubsub as source. 

in the writeStream, both .foreach or .foreachBatch cannot work to be called when stream data arrives