I am trying to read messages from kafka topic using spark.readstream, I am using the following code to read it.
My CODE:
df = spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx")
.option("subscribe", "json_topic")
.option("startingOffsets", "earliest") // From starting
.load()
Now i just want to get the count of df like we can get from df.count() method when we use spark.read.
I need to place some conditions if i didn't get any messages from the Topic. I am running this code as a batch and its a business requirement, i don't want to use spark.read.
Please suggest what would be the best approach to get the count.
Thanks in advance!