Hi!
I'm currently working on ingesting log data from Azure Event Hubs into Databricks. Initially, I was using a managed Databricks workspace, which couldn't access Event Hubs over a private endpoint. To resolve this, our DevOps team provisioned a VNet-injected workspace within the same virtual network as Event Hubs. This allowed successful ingestion, but only when using classic compute. Unfortunately, serverless compute still doesn't support this private endpoint setup.
Has anyone found a workaround for using serverless compute with Event Hubs over private endpoints? Or is classic compute the only viable option in this scenario?
Below is the DLT pipeline code that I am using:
@dlt.table(
name="poc_event_hub_process",
comment="Raw ingestion from Event Hubs/Kafka into bronze layer",
table_properties={"quality": "bronze", "delta.enableChangeDataFeed": "true"}
)
def poc_event_hub_process():
df = (spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", bootstrap_servers)
.option("subscribe", event_hub)
.option("kafka.security.protocol", "SASL_SSL")
.option("kafka.sasl.mechanism", "PLAIN")
.option("kafka.sasl.jaas.config",
f'kafkashaded.org.apache.kafka.common.security.plain.PlainLoginModule required username="{sasl_username}" password="{sasl_password}";')
.option("failOnDataLoss", "false")
.option("startingOffsets", "earliest")
.load()
)
df = df.withColumn('value', col("value").cast("string"))
return df.withColumn("value", from_json(col("value"), json_schema))
This is the error that I get when I change the compute to Serverless in the DLT pipeline settings:
terminated with exception: kafkashaded.org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeTopics
It seems like the connection is not being established, is there something that can be done to resolve this?
I'd appreciate any insights or experiences from others who've tackled similar setups, especially if you've managed to get serverless compute to ingest from EH. Thanks in advance!