Hi @Areqio ,
def getKafkaOptions(
env: String,
ehNameSpace: String,
ehName: String,
scopeName: String,
kafkaOffset: String,
ehConnKey: String,
maxOffsetsPerTrigger: String = "50000"
): Map[String, String] = {
val connStr = dbutils.secrets.get(scope = scopeName, key = ehConnKey)
Map(
"kafka.bootstrap.servers" -> s"$ehNameSpace.servicebus.windows.net:9093",
"subscribe" -> ehName,
"kafka.sasl.mechanism" -> "PLAIN",
"kafka.security.protocol" -> "SASL_SSL",
"kafka.sasl.jaas.config" ->
s"""kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username="$$ConnectionString" password="$connStr";""",
"startingOffsets" -> kafkaOffset,
"failOnDataLoss" -> "false",
"maxOffsetsPerTrigger" -> maxOffsetsPerTrigger
)
}
val kafkaOptions = getKafkaOptions(
env = "dev",
ehNameSpace = "mynamespace",
ehName = "myeventhub",
scopeName = "my-secret-scope",
kafkaOffset = "latest",
ehConnKey = "eventhub-connection-string"
)
val df = spark.readStream
.format("kafka")
.options(kafkaOptions)
.load()
val query = df.writeStream
.format("delta")
.option("checkpointLocation", "/mnt/checkpoints/eventhub")
.start("/mnt/output/eventhub")
What This Means
getKafkaOptions(...):
Builds the configuration required to connect securely to Event Hubs using its Kafka-compatible endpoint.
dbutils.secrets.get(...):
Fetches the Event Hub connection string securely from a Databricks secret scope instead of hardcoding it.
kafka.bootstrap.servers:
Points to your Event Hub namespace Kafka endpoint.
subscribe:
The Event Hub name (treated like a Kafka topic).
SASL/SSL configs:
Required authentication mechanism for connecting to Azure Event Hubs via Kafka.
startingOffsets:
Controls where to start reading from:
"latest" โ only new events
"earliest" โ read from beginning
maxOffsetsPerTrigger:
Limits how much data is processed per micro-batch (helps control load).
readStream:
Creates a streaming DataFrame from Event Hubs.
writeStream:
Writes the streaming data to a Delta table (or any supported sink).
If you want to use Scala with Azure Event Hubs in Databricks Runtime 17.3 LTS (Scala 2.13), a practical approach is to use Structured Streaming via the Kafka endpoint of Event Hubs.
Attached is a reusable helper function to build Kafka options, followed by an example of how to call it and what each part means.
Additional Notes
For Scala workloads, youโll likely need classic compute, as serverless support for Scala is still limited.
If you want a more managed approach with serverless, consider using Delta Live Tables / Lakeflow (Declarative Pipelines), but those currently favor Python/SQL over Scala.
Rohan