<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to get the count of dataframe rows when reading through spark.readstream using batch jobs? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-dataframe-rows-when-reading-through/m-p/15332#M9664</link>
    <description>&lt;P&gt;I am trying to read messages from kafka topic using &lt;B&gt;&lt;I&gt;spark.readstream, &lt;/I&gt;&lt;/B&gt;I am using the following code to read it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;My CODE:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;df = spark.readStream&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .format("kafka")&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx")&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .option("subscribe", "json_topic")&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .option("startingOffsets", "earliest") // From starting&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .load()&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now i just want to get the count of &lt;B&gt;&lt;I&gt;df &lt;/I&gt;&lt;/B&gt;like we can get from &lt;B&gt;df.count() &lt;/B&gt;method when we use &lt;B&gt;spark.read.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I need to place some conditions if i didn't get any messages from the Topic. I am running this code as a batch and its a business requirement, i don't want to use &lt;B&gt;spark.read.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Please suggest what would be the best approach to get the count.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Thanks in advance!&lt;/B&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 21 Dec 2022 16:29:40 GMT</pubDate>
    <dc:creator>SRK</dc:creator>
    <dc:date>2022-12-21T16:29:40Z</dc:date>
    <item>
      <title>How to get the count of dataframe rows when reading through spark.readstream using batch jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-dataframe-rows-when-reading-through/m-p/15332#M9664</link>
      <description>&lt;P&gt;I am trying to read messages from kafka topic using &lt;B&gt;&lt;I&gt;spark.readstream, &lt;/I&gt;&lt;/B&gt;I am using the following code to read it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;My CODE:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;df = spark.readStream&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .format("kafka")&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .option("kafka.bootstrap.servers", "192.1xx.1.1xx:9xx")&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .option("subscribe", "json_topic")&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .option("startingOffsets", "earliest") // From starting&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;        .load()&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now i just want to get the count of &lt;B&gt;&lt;I&gt;df &lt;/I&gt;&lt;/B&gt;like we can get from &lt;B&gt;df.count() &lt;/B&gt;method when we use &lt;B&gt;spark.read.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I need to place some conditions if i didn't get any messages from the Topic. I am running this code as a batch and its a business requirement, i don't want to use &lt;B&gt;spark.read.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Please suggest what would be the best approach to get the count.&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Thanks in advance!&lt;/B&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Dec 2022 16:29:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-dataframe-rows-when-reading-through/m-p/15332#M9664</guid>
      <dc:creator>SRK</dc:creator>
      <dc:date>2022-12-21T16:29:40Z</dc:date>
    </item>
    <item>
      <title>Re: How to get the count of dataframe rows when reading through spark.readstream using batch jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-dataframe-rows-when-reading-through/m-p/15333#M9665</link>
      <description>&lt;P&gt;You can try this approach:&lt;/P&gt;&lt;P&gt;&lt;A href="https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733" target="test_blank"&gt;https://stackoverflow.com/questions/57568038/how-to-see-the-dataframe-in-the-console-equivalent-of-show-for-structured-st/62161733#62161733&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;ReadStream is running a thread in background so there's no easy way like df.show().&lt;/P&gt;</description>
      <pubDate>Thu, 22 Dec 2022 13:13:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-dataframe-rows-when-reading-through/m-p/15333#M9665</guid>
      <dc:creator>daniel_sahal</dc:creator>
      <dc:date>2022-12-22T13:13:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to get the count of dataframe rows when reading through spark.readstream using batch jobs?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-dataframe-rows-when-reading-through/m-p/15334#M9666</link>
      <description>&lt;P&gt;Thanks for the suggestion. I will check.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Dec 2022 06:07:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-get-the-count-of-dataframe-rows-when-reading-through/m-p/15334#M9666</guid>
      <dc:creator>SRK</dc:creator>
      <dc:date>2022-12-23T06:07:16Z</dc:date>
    </item>
  </channel>
</rss>

