Using streaming data received from Pub/sub topic
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2024 05:04 AM
I have a notebook in Databricks in which I am streaming a Pub/sub topic. The code for this looks like following-
%pip install --upgrade google-cloud-pubsub[pandas] from pyspark.sql import SparkSession authOptions={"clientId" : "123","clientEmail" : "123@project-id.iam.gserviceaccount.com", "privateKey" : "-----BEGIN PRIVATE KEY-----1234-----END PRIVATE KEY-----\n","privateKeyId" : "1234"} stream=spark.readStream.format("pubsub").option("subscriptionId","firstfuel-reporting-test-subscription").option("topicId","firstfuel-reporting-test").option("projectId","project-id").options(**authOptions).load() decodedStream = stream.withColumn("decodedData", stream["payload"].cast("string")) result = decodedStream.writeStream.outputMode("append").format("console").start()
When I run this, I can see that streaming starts successfully and any mesages published on the Pub/sub topic are acknowledged right away. But ,I am not able to see exact payload printed on console. How can I do that. If I have to use received messages for any other purpose, how can I do that? I am attaching a view of what I am seeing after streaming starts below-
0 REPLIES 0
data:image/s3,"s3://crabby-images/2345c/2345ca6ff2e34b0d370ce03453929e5fd0c4a88d" alt=""
data:image/s3,"s3://crabby-images/2345c/2345ca6ff2e34b0d370ce03453929e5fd0c4a88d" alt=""