To fetch the last 24 hours' record count from Pub/Sub, you can use the publishTimestampInMillis field in the Pub/Sub schema to filter the records based on their publish timestamp. You can use the current_timestamp() function in Databricks to get the current timestamp and subtract 24 hours from it to get the timestamp for 24 hours ago. Then you can use the filter() function to filter the records based on their publishTimestampInMillis field.
Here's an example code snippet that demonstrates how to fetch the last 24 hours' record count from Pub/Sub using Databricks:
import org.apache.spark.sql.functions._
val authOptions: Map[String, String] =
Map("clientId" -> clientId,
"clientEmail" -> clientEmail,
"privateKey" -> privateKey,
"privateKeyId" -> privateKeyId)
val pubsubDF = spark.readStream
.format("pubsub")
.option("subscriptionId", "mysub")
.option("topicId", "mytopic")
.option("projectId", "myproject")
.options(authOptions)
.load()
val last24HoursTimestamp = current_timestamp() - expr("INTERVAL 24 HOURS")
val last24HoursCount = pubsubDF
.filter(col("publishTimestampInMillis") >= last24HoursTimestamp.cast("long"))
.count()
println(s"Last 24 hours record count: $last24HoursCount")
Note that this code snippet assumes that you have already configured the Pub/Sub connector in Databricks and have the necessary authorization options. If you haven't done so, please refer to the documentation on Subscribe to Google Pub/Sub | Databricks on Google Cloud for more information.