<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to Stream Azure event hub to databricks delta table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/156034#M54342</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/220925"&gt;@Areqio&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;+1 to what &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210897"&gt;@balajij8&lt;/a&gt; suggested about using &lt;STRONG&gt;Lakeflow Declarative Pipelines&lt;/STRONG&gt; as the simplest, supported way to land &lt;STRONG&gt;Azure Event Hubs IoT data into Delta&lt;/STRONG&gt;. Lakeflow Spark declarative pipelines are built on top of &lt;STRONG&gt;Structured Streaming&lt;/STRONG&gt;, so you get robust pipeline orchestration with only a small amount of Python/SQL instead of wiring everything manually in Scala.&lt;/P&gt;
&lt;P&gt;A couple of concrete points from the official docs on &lt;STRONG&gt;“Use Azure Event Hubs as a pipeline data source”&lt;/STRONG&gt; that are worth calling out:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Use the Kafka endpoint, not the old Event Hubs connector&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The classic &lt;CODE&gt;azure-event-hubs-spark&lt;/CODE&gt; connector is &lt;STRONG&gt;not&lt;/STRONG&gt; available in Databricks Runtime for Lakeflow, and Lakeflow pipelines don’t allow adding third-party JVM libraries.&lt;/LI&gt;
&lt;LI&gt;Instead, Event Hubs exposes an &lt;STRONG&gt;Apache Kafka–compatible endpoint&lt;/STRONG&gt; that you read with the built-in &lt;STRONG&gt;Structured Streaming Kafka connector&lt;/STRONG&gt; that’s already in the runtime.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Authenticate with SAS via secrets&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Event Hubs gives you a &lt;STRONG&gt;namespace&lt;/STRONG&gt;, &lt;STRONG&gt;hub name&lt;/STRONG&gt;, and a &lt;STRONG&gt;shared access policy (name + key)&lt;/STRONG&gt;; the docs recommend putting the key into a &lt;STRONG&gt;Databricks secret scope&lt;/STRONG&gt; (via CLI) and reading it in the pipeline rather than hard-coding it.&lt;/LI&gt;
&lt;LI&gt;The example builds a connection string like:&lt;BR /&gt;&lt;CODE&gt;Endpoint=sb://{NAMESPACE}.servicebus.windows.net/;SharedAccessKeyName={POLICY_NAME};SharedAccessKey={POLICY_KEY}&lt;/CODE&gt; and then uses that in the Kafka &lt;CODE&gt;sasl.jaas.config&lt;/CODE&gt; option.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Minimal Lakeflow pipeline pattern (Python)&lt;/STRONG&gt;&lt;BR /&gt;The docs show a pattern roughly like this (simplified here):&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Configure Event Hubs and Kafka options from &lt;STRONG&gt;pipeline settings&lt;/STRONG&gt; (e.g. &lt;CODE&gt;iot.ingestion.eh.namespace&lt;/CODE&gt;, &lt;CODE&gt;iot.ingestion.eh.name&lt;/CODE&gt;, &lt;CODE&gt;iot.ingestion.kafka.requestTimeout&lt;/CODE&gt;, etc.).&lt;/LI&gt;
&lt;LI&gt;Store the SAS key in a &lt;STRONG&gt;secret scope&lt;/STRONG&gt;, retrieve it with &lt;CODE&gt;dbutils.secrets.get&lt;/CODE&gt;, and construct the Kafka &lt;CODE&gt;SASL_SSL&lt;/CODE&gt; options.&lt;/LI&gt;
&lt;LI&gt;In the pipeline code, use &lt;CODE&gt;spark.readStream.format("kafka").options(**KAFKA_OPTIONS).load()&lt;/CODE&gt; to read from the Event Hubs topic and then parse your IoT JSON payload into a typed schema before writing to a &lt;STRONG&gt;Delta table&lt;/STRONG&gt; (typically a bronze table with date partitioning on an event or enqueue timestamp).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The official doc (AWS flavor but the Event Hubs/Kafka wiring is the same on Azure) is here:&lt;BR /&gt;&lt;STRONG&gt;Use Azure Event Hubs as a pipeline data source&lt;/STRONG&gt; – &lt;A href="https://docs.databricks.com/aws/en/ldp/event-hubs" target="_blank"&gt;https://docs.databricks.com/aws/en/ldp/event-hubs&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Where your 17.3 LTS / Scala 2.13 fits in&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Your compute runtime (17.3 LTS, Scala 2.13) is fine for running the &lt;STRONG&gt;Kafka structured streaming&lt;/STRONG&gt; side of this; the key change is that the &lt;EM&gt;pipeline definition&lt;/EM&gt; itself is usually written in &lt;STRONG&gt;Python or SQL&lt;/STRONG&gt; for Lakeflow. The docs only show Python examples today; I don’t know of a Scala API for Lakeflow declarative pipelines at this time.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;If you prefer not to adopt Lakeflow yet, you can still follow the &lt;STRONG&gt;same Kafka configuration&lt;/STRONG&gt; (using the Event Hubs Kafka endpoint + SASL_SSL) directly in a classic Structured Streaming job and then write to a Delta table from Scala.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers, Lou.&lt;/P&gt;</description>
    <pubDate>Mon, 04 May 2026 01:54:49 GMT</pubDate>
    <dc:creator>Louis_Frolio</dc:creator>
    <dc:date>2026-05-04T01:54:49Z</dc:date>
    <item>
      <title>How to Stream Azure event hub to databricks delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/151169#M53606</link>
      <description>&lt;P&gt;I am trying to stream my IoT data from azure event hub to databricks. Im running Databricks runtime 17.3 LTS with scala 2.13.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Mar 2026 17:18:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/151169#M53606</guid>
      <dc:creator>Areqio</dc:creator>
      <dc:date>2026-03-17T17:18:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to Stream Azure event hub to databricks delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/151175#M53608</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/220925"&gt;@Areqio&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;You can use Lakeflow Declarative Pipelines to stream Azure Event Hub IoT data into Databricks delta tables. Lakeflow Spark Declarative Pipelines extends functionality in Spark Structured Streaming and allows you to write just a few lines of declarative Python or SQL to create robust pipelines.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;More details &lt;A href="https://docs.databricks.com/aws/en/ldp/event-hubs" target="_self"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Mar 2026 18:14:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/151175#M53608</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-03-17T18:14:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to Stream Azure event hub to databricks delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/156034#M54342</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/220925"&gt;@Areqio&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;+1 to what &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210897"&gt;@balajij8&lt;/a&gt; suggested about using &lt;STRONG&gt;Lakeflow Declarative Pipelines&lt;/STRONG&gt; as the simplest, supported way to land &lt;STRONG&gt;Azure Event Hubs IoT data into Delta&lt;/STRONG&gt;. Lakeflow Spark declarative pipelines are built on top of &lt;STRONG&gt;Structured Streaming&lt;/STRONG&gt;, so you get robust pipeline orchestration with only a small amount of Python/SQL instead of wiring everything manually in Scala.&lt;/P&gt;
&lt;P&gt;A couple of concrete points from the official docs on &lt;STRONG&gt;“Use Azure Event Hubs as a pipeline data source”&lt;/STRONG&gt; that are worth calling out:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Use the Kafka endpoint, not the old Event Hubs connector&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The classic &lt;CODE&gt;azure-event-hubs-spark&lt;/CODE&gt; connector is &lt;STRONG&gt;not&lt;/STRONG&gt; available in Databricks Runtime for Lakeflow, and Lakeflow pipelines don’t allow adding third-party JVM libraries.&lt;/LI&gt;
&lt;LI&gt;Instead, Event Hubs exposes an &lt;STRONG&gt;Apache Kafka–compatible endpoint&lt;/STRONG&gt; that you read with the built-in &lt;STRONG&gt;Structured Streaming Kafka connector&lt;/STRONG&gt; that’s already in the runtime.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Authenticate with SAS via secrets&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Event Hubs gives you a &lt;STRONG&gt;namespace&lt;/STRONG&gt;, &lt;STRONG&gt;hub name&lt;/STRONG&gt;, and a &lt;STRONG&gt;shared access policy (name + key)&lt;/STRONG&gt;; the docs recommend putting the key into a &lt;STRONG&gt;Databricks secret scope&lt;/STRONG&gt; (via CLI) and reading it in the pipeline rather than hard-coding it.&lt;/LI&gt;
&lt;LI&gt;The example builds a connection string like:&lt;BR /&gt;&lt;CODE&gt;Endpoint=sb://{NAMESPACE}.servicebus.windows.net/;SharedAccessKeyName={POLICY_NAME};SharedAccessKey={POLICY_KEY}&lt;/CODE&gt; and then uses that in the Kafka &lt;CODE&gt;sasl.jaas.config&lt;/CODE&gt; option.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Minimal Lakeflow pipeline pattern (Python)&lt;/STRONG&gt;&lt;BR /&gt;The docs show a pattern roughly like this (simplified here):&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Configure Event Hubs and Kafka options from &lt;STRONG&gt;pipeline settings&lt;/STRONG&gt; (e.g. &lt;CODE&gt;iot.ingestion.eh.namespace&lt;/CODE&gt;, &lt;CODE&gt;iot.ingestion.eh.name&lt;/CODE&gt;, &lt;CODE&gt;iot.ingestion.kafka.requestTimeout&lt;/CODE&gt;, etc.).&lt;/LI&gt;
&lt;LI&gt;Store the SAS key in a &lt;STRONG&gt;secret scope&lt;/STRONG&gt;, retrieve it with &lt;CODE&gt;dbutils.secrets.get&lt;/CODE&gt;, and construct the Kafka &lt;CODE&gt;SASL_SSL&lt;/CODE&gt; options.&lt;/LI&gt;
&lt;LI&gt;In the pipeline code, use &lt;CODE&gt;spark.readStream.format("kafka").options(**KAFKA_OPTIONS).load()&lt;/CODE&gt; to read from the Event Hubs topic and then parse your IoT JSON payload into a typed schema before writing to a &lt;STRONG&gt;Delta table&lt;/STRONG&gt; (typically a bronze table with date partitioning on an event or enqueue timestamp).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The official doc (AWS flavor but the Event Hubs/Kafka wiring is the same on Azure) is here:&lt;BR /&gt;&lt;STRONG&gt;Use Azure Event Hubs as a pipeline data source&lt;/STRONG&gt; – &lt;A href="https://docs.databricks.com/aws/en/ldp/event-hubs" target="_blank"&gt;https://docs.databricks.com/aws/en/ldp/event-hubs&lt;/A&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;&lt;STRONG&gt;Where your 17.3 LTS / Scala 2.13 fits in&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Your compute runtime (17.3 LTS, Scala 2.13) is fine for running the &lt;STRONG&gt;Kafka structured streaming&lt;/STRONG&gt; side of this; the key change is that the &lt;EM&gt;pipeline definition&lt;/EM&gt; itself is usually written in &lt;STRONG&gt;Python or SQL&lt;/STRONG&gt; for Lakeflow. The docs only show Python examples today; I don’t know of a Scala API for Lakeflow declarative pipelines at this time.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;If you prefer not to adopt Lakeflow yet, you can still follow the &lt;STRONG&gt;same Kafka configuration&lt;/STRONG&gt; (using the Event Hubs Kafka endpoint + SASL_SSL) directly in a classic Structured Streaming job and then write to a Delta table from Scala.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers, Lou.&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 01:54:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/156034#M54342</guid>
      <dc:creator>Louis_Frolio</dc:creator>
      <dc:date>2026-05-04T01:54:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to Stream Azure event hub to databricks delta table</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/156099#M54354</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/220925"&gt;@Areqio&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;def getKafkaOptions(
  env: String,
  ehNameSpace: String,
  ehName: String,
  scopeName: String,
  kafkaOffset: String,
  ehConnKey: String,
  maxOffsetsPerTrigger: String = "50000"
): Map[String, String] = {

  val connStr = dbutils.secrets.get(scope = scopeName, key = ehConnKey)

  Map(
    "kafka.bootstrap.servers" -&amp;gt; s"$ehNameSpace.servicebus.windows.net:9093",
    "subscribe" -&amp;gt; ehName,
    "kafka.sasl.mechanism" -&amp;gt; "PLAIN",
    "kafka.security.protocol" -&amp;gt; "SASL_SSL",
    "kafka.sasl.jaas.config" -&amp;gt;
      s"""kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username="$$ConnectionString" password="$connStr";""",
    "startingOffsets" -&amp;gt; kafkaOffset,
    "failOnDataLoss" -&amp;gt; "false",
    "maxOffsetsPerTrigger" -&amp;gt; maxOffsetsPerTrigger
  )
}

val kafkaOptions = getKafkaOptions(
  env = "dev",
  ehNameSpace = "mynamespace",
  ehName = "myeventhub",
  scopeName = "my-secret-scope",
  kafkaOffset = "latest",
  ehConnKey = "eventhub-connection-string"
)

val df = spark.readStream
  .format("kafka")
  .options(kafkaOptions)
  .load()

val query = df.writeStream
  .format("delta")
  .option("checkpointLocation", "/mnt/checkpoints/eventhub")
  .start("/mnt/output/eventhub")
  
  
What This Means
getKafkaOptions(...):
Builds the configuration required to connect securely to Event Hubs using its Kafka-compatible endpoint.
dbutils.secrets.get(...):
Fetches the Event Hub connection string securely from a Databricks secret scope instead of hardcoding it.
kafka.bootstrap.servers:
Points to your Event Hub namespace Kafka endpoint.
subscribe:
The Event Hub name (treated like a Kafka topic).
SASL/SSL configs:
Required authentication mechanism for connecting to Azure Event Hubs via Kafka.
startingOffsets:
Controls where to start reading from:
"latest" → only new events
"earliest" → read from beginning
maxOffsetsPerTrigger:
Limits how much data is processed per micro-batch (helps control load).
readStream:
Creates a streaming DataFrame from Event Hubs.
writeStream:
Writes the streaming data to a Delta table (or any supported sink).&lt;/LI-CODE&gt;&lt;P&gt;If you want to use Scala with &lt;STRONG&gt;Azure Event Hubs in Databricks Runtime 17.3 LTS (Scala 2.13)&lt;/STRONG&gt;, a practical approach is to use Structured Streaming via the Kafka endpoint of Event Hubs.&lt;/P&gt;&lt;P&gt;Attached is a reusable helper function to build Kafka options, followed by an example of how to call it and what each part means.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Additional Notes&lt;/STRONG&gt;&lt;BR /&gt;For Scala workloads, you’ll likely need classic compute, as serverless support for Scala is still limited.&lt;BR /&gt;If you want a more managed approach with serverless, consider using Delta Live Tables / Lakeflow (Declarative Pipelines), but those currently favor Python/SQL over Scala.&lt;/P&gt;</description>
      <pubDate>Mon, 04 May 2026 17:09:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-stream-azure-event-hub-to-databricks-delta-table/m-p/156099#M54354</guid>
      <dc:creator>rohan22sri</dc:creator>
      <dc:date>2026-05-04T17:09:56Z</dc:date>
    </item>
  </channel>
</rss>

