cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to stream from google pub/sub

210573
New Contributor

I am trying to run below for subscribing to a pubsub but this code is throwing this exception

java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/DataSourceV2

I have tried using all versions of https://mvnrepository.com/artifact/com.google.cloud/pubsublite-spark-sql-streaming no luck so far.

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('Simple Pub/Sub Lite Read').getOrCreate()

df = spark.readStream \

 .format("pubsublite") \

 .option("pubsublite.subscription", "My subscription path") \

 .option("gcp.credentials.key", "my gcp credential").load()

df.show(10, False)

4 REPLIES 4

-werners-
Esteemed Contributor III

can you retry without creating a sparksession? As databricks provides one for you.

Noopur_Nigam
Valued Contributor II
Valued Contributor II

Hi @cloud user​ As of now, we do not have structured streaming support with Pub/Sub. Below are the supported sources with structured streaming:

https://docs.gcp.databricks.com/spark/latest/structured-streaming/data-sources.html

Kaniz
Community Manager
Community Manager

Hi @cloud user​ , We haven’t heard from you on the last response from @Noopur Nigam​, and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Ajay-Pandey
Esteemed Contributor III

Hi @210573 

Databricks now start supporting pub/sub streaming natively now you can start using pubsub streaming for your use case for more info visit below official URL -

PUB/SUB with Databricks