cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

trying to send data from a stream table to an azure event hub in a serverless cluster

Areqio
New Contributor II

Is there a way to stream data from Databricks to Azure event hubs in a serverless pipeline environment without using the azure-eventhub library, since it isnโ€™t compatible with serverless pipelines, and instead rely solely on the Kafka-compatible interface?

2 REPLIES 2

amirabedhiafi
New Contributor III

Hello @Areqio !

Yes, you can use Azure event hubs through its Kafka compatible endpoint and not the azure-eventhubs-spark / azure-eventhub connector. JVM libraries are not allowed in LSDP and event hubs should be accessed through the built in Spark Kafka connector.

For writing from a streaming table to event hubs, you use the pipeline sink API with format = "kafka" and an append_flow.

The event hub name is used as the Kafka topic and the namespace is the Kafka bootstrap server on port 9093. Event hubs supports this Kafka endpoint in standard, premium and dedicated tiers not basic. 

For the Kafka writer, the outgoing dataframe must contain a value column and it can optionally contain key, topic, partition and headers.

The only limitation is that the sink API is python only and supports streaming queries only and must be used through append_flow and a full refresh does not clean up external sink data so replayed data can be appended again.

 

If this answer resolves your question, could you please mark it as โ€œAccept as Solutionโ€? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP

Thank you for the reply! do you have a code example? I was able to do it for a delta table through a notebook in a serverless cluster using kafka, but cannot do it for a streaming table nor in a dlt pipeline