Databricks Pub-Sub Data Recon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2023 12:21 AM
I am trying to setup a recon activity between GCP Pub-Sub and databricks, Is there any way to fetch the last 24hrs record count from Pub-Sub?
I tried but not got any direct solution for it, It will be great if any one can suggest me the way t#pubsub, #databrickso achieve it.
#pubsub #databricks
- Labels:
-
Azure databricks
-
pubsub
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2023 03:25 PM
To fetch the last 24 hours' record count from Pub/Sub, you can use the publishTimestampInMillis field in the Pub/Sub schema to filter the records based on their publish timestamp. You can use the current_timestamp() function in Databricks to get the current timestamp and subtract 24 hours from it to get the timestamp for 24 hours ago. Then you can use the filter() function to filter the records based on their publishTimestampInMillis field.
Here's an example code snippet that demonstrates how to fetch the last 24 hours' record count from Pub/Sub using Databricks:
import org.apache.spark.sql.functions._
val authOptions: Map[String, String] =
Map("clientId" -> clientId,
"clientEmail" -> clientEmail,
"privateKey" -> privateKey,
"privateKeyId" -> privateKeyId)
val pubsubDF = spark.readStream
.format("pubsub")
.option("subscriptionId", "mysub")
.option("topicId", "mytopic")
.option("projectId", "myproject")
.options(authOptions)
.load()
val last24HoursTimestamp = current_timestamp() - expr("INTERVAL 24 HOURS")
val last24HoursCount = pubsubDF
.filter(col("publishTimestampInMillis") >= last24HoursTimestamp.cast("long"))
.count()
println(s"Last 24 hours record count: $last24HoursCount")
Note that this code snippet assumes that you have already configured the Pub/Sub connector in Databricks and have the necessary authorization options. If you haven't done so, please refer to the documentation on Subscribe to Google Pub/Sub | Databricks on Google Cloud for more information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2023 10:57 PM
Hi @Prabakar
Thanks for the quick reply, I am looking for direct data count on PUBSUB not in databricks as we have to verify how many records were there in PUBSUB and how many records we have received in databricks on last 24 hrs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2023 02:37 AM
Hi @Ajay-Pandey
Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.
Cheers!

