Hi There!
I am to trying write a batch data to kafka topic with schema registry in databricks using pyspark, i serialize the data with pyspark to_avro function and write it to the topic, but the consumers canโt read the schema id. If they do not separate the schema_id first 5 bytes they can read the data as well.
I read the avro schema from .avsc file downloaded in confluent, and itโs version is ok.
This is my script:
df.select(to_avro(carrossel_schema, schema_str)).alias("value") \
.write \
.format('kafka') \
.option("kafka.bootstrap.servers", confluent_server) \
.option("topic", topico) \
.option("kafka.security.protocol", "SASL_SSL") \
.option(
"kafka.sasl.jaas.config",
f"kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username='{confluent_user}' password='{confluent_pass}';"
) \
.option("kafka.ssl.endpoint.identification.algorithm", "https") \
.option("kafka.sasl.mechanism", "PLAIN") \
.save()
return df