How to write data to Confluent Kafka with SchemaRegistry format on sparkstructured?

GCosta — Wed, 12 Jun 2024 20:01:41 GMT

Hi There!
I am to trying write a batch data to kafka topic with schema registry in databricks using pyspark, i serialize the data with pyspark to_avro function and write it to the topic, but the consumers can’t read the schema id. If they do not separate the schema_id first 5 bytes they can read the data as well.
I read the avro schema from .avsc file downloaded in confluent, and it’s version is ok.

This is my script:

df.select(to_avro(carrossel_schema, schema_str)).alias("value") \
.write \
.format('kafka') \
.option("kafka.bootstrap.servers", confluent_server) \
.option("topic", topico) \
.option("kafka.security.protocol", "SASL_SSL") \
.option(
"kafka.sasl.jaas.config",
f"kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username='{confluent_user}' password='{confluent_pass}';"
) \
.option("kafka.ssl.endpoint.identification.algorithm", "https") \
.option("kafka.sasl.mechanism", "PLAIN") \
.save()

return df

topic How to write data to Confluent Kafka with SchemaRegistry format on sparkstructured? in Data Engineering

How to write data to Confluent Kafka with SchemaRegistry format on sparkstructured?