<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Spark 3.3.0 connect kafka problem in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/33763#M24701</link>
    <description>&lt;P&gt;This is how I can config to run PySpark (scala 2.12 Spark 3.2.1) Structure Streaming with Kafka on jupyter lab (need to download 2 jars file spark-sql-kafka-0-10_2.12-3.2.1.jar, kafka-clients-2.1.1.jar to folder jars)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark = SparkSession\&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.builder\&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.config("spark.jars", os.getcwd() + "/jars/spark-sql-kafka-0-10_2.12-3.2.1.jar" + "," + os.getcwd() + "/jars/kafka-clients-2.1.1.jar") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.appName("Structured_Redpanda_WordCount")\&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.getOrCreate()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.conf.set("spark.sql.shuffle.partitions", 1)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 06 Nov 2022 16:56:14 GMT</pubDate>
    <dc:creator>Nghiaht1</dc:creator>
    <dc:date>2022-11-06T16:56:14Z</dc:date>
    <item>
      <title>Spark 3.3.0 connect kafka problem</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/33762#M24700</link>
      <description>&lt;P&gt;I am trying to connect to my Kafka from spark but getting an error:&lt;/P&gt;&lt;P&gt;Kafka Version: 2.4.1&lt;/P&gt;&lt;P&gt;Spark Version: 3.3.0&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am using jupyter notebook to execute the pyspark code below:&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import *&lt;/P&gt;&lt;P&gt;from pyspark.sql.types import *&lt;/P&gt;&lt;P&gt;#import library&amp;nbsp;&lt;/P&gt;&lt;P&gt;import os&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0 pyspark-shell'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;sc = SparkSession.builder.appName('Pyspark_kafka').getOrCreate()&lt;/P&gt;&lt;P&gt;df = sc \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.readStream \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.format("kafka") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("kafka.bootstrap.servers", "zonos.engrid.in:9092") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("subscribe", "ext_device-event_10121") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("startingOffsets", "earliest") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("endingOffsets", "latest") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.load()&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Which gives me the following error:&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;---------------------------------------------------------------&lt;/P&gt;&lt;P&gt;AnalysisException&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Traceback (most recent call last)&lt;/P&gt;&lt;P&gt;&amp;lt;ipython-input-18-409d93832e70&amp;gt; in &amp;lt;module&amp;gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("subscribe", "ext_device-event_10121") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("startingOffsets", "earliest") \&lt;/P&gt;&lt;P&gt;----&amp;gt; 7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.option("endingOffsets", "latest") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;.load()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/opt/spark/spark-3.3.0-bin-hadoop3/python/pyspark/sql/streaming.py in load(self, path, format, schema, **options)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;467&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return self._df(self._jreader.load(path))&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;468&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;else:&lt;/P&gt;&lt;P&gt;--&amp;gt; 469&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return self._df(self._jreader.load())&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;470&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;471&amp;nbsp;&amp;nbsp;&amp;nbsp;def json(&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/usr/local/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, *args)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;1255&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;answer = self.gateway_client.send_command(command)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;1256&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return_value = get_return_value(&lt;/P&gt;&lt;P&gt;-&amp;gt; 1257&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;answer, self.gateway_client, self.target_id, self.name)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;1258&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;1259&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;for temp_arg in temp_args:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;/opt/spark/spark-3.3.0-bin-hadoop3/python/pyspark/sql/utils.py in deco(*a, **kw)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;194&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# Hide where the exception came from that shows a non-Pythonic&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;195&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;# JVM exception message.&lt;/P&gt;&lt;P&gt;--&amp;gt; 196&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;raise converted from None&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;197&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;else:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;198&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;raise&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;AnalysisException:&amp;nbsp;Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;```&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Not sure what's wrong with the connector or the code please help me out.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 23 Aug 2022 11:40:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/33762#M24700</guid>
      <dc:creator>avnish26</dc:creator>
      <dc:date>2022-08-23T11:40:36Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 3.3.0 connect kafka problem</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/33763#M24701</link>
      <description>&lt;P&gt;This is how I can config to run PySpark (scala 2.12 Spark 3.2.1) Structure Streaming with Kafka on jupyter lab (need to download 2 jars file spark-sql-kafka-0-10_2.12-3.2.1.jar, kafka-clients-2.1.1.jar to folder jars)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark = SparkSession\&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.builder\&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.config("spark.jars", os.getcwd() + "/jars/spark-sql-kafka-0-10_2.12-3.2.1.jar" + "," + os.getcwd() + "/jars/kafka-clients-2.1.1.jar") \&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.appName("Structured_Redpanda_WordCount")\&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;.getOrCreate()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.conf.set("spark.sql.shuffle.partitions", 1)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 06 Nov 2022 16:56:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/33763#M24701</guid>
      <dc:creator>Nghiaht1</dc:creator>
      <dc:date>2022-11-06T16:56:14Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 3.3.0 connect kafka problem</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/33764#M24702</link>
      <description>&lt;P&gt;In my case it solved, just download the jars.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Dec 2022 19:25:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/33764#M24702</guid>
      <dc:creator>weldermartins</dc:creator>
      <dc:date>2022-12-28T19:25:04Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 3.3.0 connect kafka problem</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/56320#M30518</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;What jar has been added and under which location and how to integrated jar with code?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jan 2024 22:39:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/56320#M30518</guid>
      <dc:creator>dilip4dd</dc:creator>
      <dc:date>2024-01-02T22:39:43Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 3.3.0 connect kafka problem</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/56390#M30536</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/46015"&gt;@avnish26&lt;/a&gt;, did you added the Jar files to the cluster? do you still have issues? please let us know&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jan 2024 22:23:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/56390#M30536</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2024-01-03T22:23:22Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 3.3.0 connect kafka problem</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/100538#M40328</link>
      <description>&lt;P&gt;import findspark&lt;BR /&gt;findspark.init()&lt;BR /&gt;from pyspark.sql import SparkSession&lt;BR /&gt;spark = SparkSession.builder \&lt;BR /&gt;.appName("kafkaStream") \&lt;BR /&gt;.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.3") \&lt;BR /&gt;.getOrCreate()&lt;BR /&gt;# Read data from Kafka topic&lt;BR /&gt;df = spark.readStream.format("kafka") \&lt;BR /&gt;.option("kafka.bootstrap.servers", "localhost:9092") \&lt;BR /&gt;.option("subscribe", "firstStream") \&lt;BR /&gt;.option("startingsets","latest")\&lt;BR /&gt;.load()\&lt;BR /&gt;.selectExpr("CAST(value as STRING)")&lt;BR /&gt;------------------------------&lt;BR /&gt;I have the same error ,but keep in mind spark version is 3.5.3&lt;/P&gt;</description>
      <pubDate>Sun, 01 Dec 2024 22:04:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-3-3-0-connect-kafka-problem/m-p/100538#M40328</guid>
      <dc:creator>Assem</dc:creator>
      <dc:date>2024-12-01T22:04:38Z</dc:date>
    </item>
  </channel>
</rss>

