Using streaming data received from Pub/sub topic

sumitdesai — Tue, 06 Feb 2024 13:04:24 GMT

I have a notebook in Databricks in which I am streaming a Pub/sub topic. The code for this looks like following-

%pip install --upgrade google-cloud-pubsub[pandas]    
from pyspark.sql import SparkSession

authOptions={"clientId" : "123","clientEmail" : "123@project-id.iam.gserviceaccount.com", "privateKey" : "-----BEGIN PRIVATE KEY-----1234-----END PRIVATE KEY-----\n","privateKeyId" : "1234"}    
stream=spark.readStream.format("pubsub").option("subscriptionId","firstfuel-reporting-test-subscription").option("topicId","firstfuel-reporting-test").option("projectId","project-id").options(**authOptions).load()
decodedStream = stream.withColumn("decodedData", stream["payload"].cast("string"))
result = decodedStream.writeStream.outputMode("append").format("console").start()

When I run this, I can see that streaming starts successfully and any mesages published on the Pub/sub topic are acknowledged right away. But ,I am not able to see exact payload printed on console. How can I do that. If I have to use received messages for any other purpose, how can I do that? I am attaching a view of what I am seeing after streaming starts below-

topic Using streaming data received from Pub/sub topic in Get Started Discussions

Using streaming data received from Pub/sub topic