<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SDP continuous mode in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/sdp-continuous-mode/m-p/160168#M54862</link>
    <description>&lt;P&gt;Yes, we can build a continuous streaming pipeline using open source Spark. The main thing is to use Spark Structured Streaming, not a normal batch read. For Kafka streaming, we need to use spark.readStream, then write using writeStream, and keep the query running with awaitTermination().&lt;/P&gt;&lt;P&gt;Sample code (python):&lt;/P&gt;&lt;P&gt;# Read continuously from Kafka&lt;BR /&gt;kafka_df = (&lt;BR /&gt;spark.readStream&lt;BR /&gt;.format("kafka")&lt;BR /&gt;.option("kafka.bootstrap.servers", "kafka:9092")&lt;BR /&gt;.option("subscribe", "your_topic_name")&lt;BR /&gt;.option("startingOffsets", "latest")&lt;BR /&gt;.load()&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;# Convert Kafka key/value from binary to string&lt;BR /&gt;parsed_df = (&lt;BR /&gt;kafka_df&lt;BR /&gt;.select(&lt;BR /&gt;col("key").cast("string").alias("key"),&lt;BR /&gt;col("value").cast("string").alias("value"),&lt;BR /&gt;col("timestamp")&lt;BR /&gt;)&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;# Write stream output&lt;BR /&gt;query = (&lt;BR /&gt;parsed_df.writeStream&lt;BR /&gt;.format("delta") # can be changed to parquet/delta etc.&lt;BR /&gt;.outputMode("append")&lt;BR /&gt;.option("checkpointLocation", "/tmp/checkpoints/kafka_stream_poc")&lt;BR /&gt;.start()&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;# Keeps the streaming job alive&lt;BR /&gt;query.awaitTermination()&lt;/P&gt;</description>
    <pubDate>Tue, 23 Jun 2026 02:55:37 GMT</pubDate>
    <dc:creator>bala_sai</dc:creator>
    <dc:date>2026-06-23T02:55:37Z</dc:date>
    <item>
      <title>SDP continuous mode</title>
      <link>https://community.databricks.com/t5/data-engineering/sdp-continuous-mode/m-p/160156#M54861</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I was doing a POC and hence used open source spark and kafka in docket container and got it working. The sample code is ingesting data from kafka but it is running only in batch mode. Not able to continuously ingest the kafka stream&lt;/P&gt;&lt;P&gt;Question: Can we create streaming&amp;nbsp;&amp;nbsp;continuous pipeline using open source spark?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jun 2026 20:21:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sdp-continuous-mode/m-p/160156#M54861</guid>
      <dc:creator>lachu</dc:creator>
      <dc:date>2026-06-22T20:21:42Z</dc:date>
    </item>
    <item>
      <title>Re: SDP continuous mode</title>
      <link>https://community.databricks.com/t5/data-engineering/sdp-continuous-mode/m-p/160168#M54862</link>
      <description>&lt;P&gt;Yes, we can build a continuous streaming pipeline using open source Spark. The main thing is to use Spark Structured Streaming, not a normal batch read. For Kafka streaming, we need to use spark.readStream, then write using writeStream, and keep the query running with awaitTermination().&lt;/P&gt;&lt;P&gt;Sample code (python):&lt;/P&gt;&lt;P&gt;# Read continuously from Kafka&lt;BR /&gt;kafka_df = (&lt;BR /&gt;spark.readStream&lt;BR /&gt;.format("kafka")&lt;BR /&gt;.option("kafka.bootstrap.servers", "kafka:9092")&lt;BR /&gt;.option("subscribe", "your_topic_name")&lt;BR /&gt;.option("startingOffsets", "latest")&lt;BR /&gt;.load()&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;# Convert Kafka key/value from binary to string&lt;BR /&gt;parsed_df = (&lt;BR /&gt;kafka_df&lt;BR /&gt;.select(&lt;BR /&gt;col("key").cast("string").alias("key"),&lt;BR /&gt;col("value").cast("string").alias("value"),&lt;BR /&gt;col("timestamp")&lt;BR /&gt;)&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;# Write stream output&lt;BR /&gt;query = (&lt;BR /&gt;parsed_df.writeStream&lt;BR /&gt;.format("delta") # can be changed to parquet/delta etc.&lt;BR /&gt;.outputMode("append")&lt;BR /&gt;.option("checkpointLocation", "/tmp/checkpoints/kafka_stream_poc")&lt;BR /&gt;.start()&lt;BR /&gt;)&lt;/P&gt;&lt;P&gt;# Keeps the streaming job alive&lt;BR /&gt;query.awaitTermination()&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jun 2026 02:55:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sdp-continuous-mode/m-p/160168#M54862</guid>
      <dc:creator>bala_sai</dc:creator>
      <dc:date>2026-06-23T02:55:37Z</dc:date>
    </item>
    <item>
      <title>Re: SDP continuous mode</title>
      <link>https://community.databricks.com/t5/data-engineering/sdp-continuous-mode/m-p/160175#M54865</link>
      <description>&lt;P&gt;Hmm. This looks more like imperative programming than sdp&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would you be able to give me a sample with &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/25059"&gt;@DP&lt;/a&gt;.table?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jun 2026 04:29:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sdp-continuous-mode/m-p/160175#M54865</guid>
      <dc:creator>lachu</dc:creator>
      <dc:date>2026-06-23T04:29:13Z</dc:date>
    </item>
  </channel>
</rss>

