cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Failures Streaming data to Pulsar

surband
New Contributor III

I am encountering the following exception when attempting to stream data to a pulsar topic. This is a first time implementation - any ideas to resolve this is greatly appreciated.

DBR: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)

1 Driver64 GB Memory, 16 Cores
Runtime14.3.x-cpu-ml-scala2.12

Exception:

Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.types.StructType.toAttributes()Lscala/collection/Seq;
at org.apache.spark.sql.pulsar.PulsarSink.addBatch(PulsarSinks.scala:47)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.addBatch(MicroBatchExecution.scala:1236)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$19(MicroBatchExecution.scala:1465)
Code:

val query = sourceDF
  .select( to_json( struct(col("*")) ) .alias("value") )
  .writeStream
  .format("pulsar")
  .option("service.url", pulsarServiceUrl)
  .option("topic", pulsarTopic)
  .option("checkpointLocation", checkpointLocation)
  .trigger(Trigger.ProcessingTime("10 seconds"))
  .start()
1 ACCEPTED SOLUTION

Accepted Solutions

shan_chandra
Databricks Employee
Databricks Employee

@surband - The feature is in public preview within DBR. Only Read from pulsar sources are supported. we shall follow up with the engg for write support to Pulsar. 

https://docs.databricks.com/en/connect/streaming/pulsar.html#stream-from-apache-pulsar

View solution in original post

7 REPLIES 7

shan_chandra
Databricks Employee
Databricks Employee

Hi @surband  - can you please share the full error stack trace. Also, please use the compatible DBR(Spark) version instead of ML runtime. Please refer to the below document and validate if you have the necessary connector libraries added to the cluster.

https://docs.streamnative.io/hub/data-processing-pulsar-spark-3.2

surband
New Contributor III

Please see attached log files and screenshot of DBR. The ones I selected for Runtime is one of the options in the dropdown. I can't tell from DBR which version of stream native is used underneath. 

shan_chandra
Databricks Employee
Databricks Employee

@surband  - Databricks Runtime version has a dropdown when you edit the cluster. There will be two options - Standard, ML.  could you please let us know if you have spark pulsar connector added to the cluster libraries? 

Per the documentation here, Structured Streaming provides exactly-once processing semantics for data read from Pulsar sources.

 

surband
New Contributor III

Hello Shan_chandra, Tried with Standard DBR as suggested but same result. Spark Pulsar Connector as I understand is comes preinstalled in DBR. I have not explicitly installed anything - I did not see an documentation to do the same. The attached image "streamnative-pulsar.png" is a screenshot of Environments tab - that shows it's available in class path.

surband
New Contributor III

@shan_chandra any suggestions ?

surband
New Contributor III

Logs attached

shan_chandra
Databricks Employee
Databricks Employee

@surband - The feature is in public preview within DBR. Only Read from pulsar sources are supported. we shall follow up with the engg for write support to Pulsar. 

https://docs.databricks.com/en/connect/streaming/pulsar.html#stream-from-apache-pulsar

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group