cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Structured streaming job sees throughput being capped after running normally for a few days

databricksuser2
New Contributor II

The job (written in PySpark) uses azure eventhub as source and use Databricks delta table as sink. The job is hosted in Azure Databricks.

Transformation part is simple, the message body is converted from bytes to json string, the json string is then added as a new column, something like:

df = spark.readStream.format("eventhubs").load()
df = df.withColumn("stringBody", col("Body").cast(stringType()))
query = df.writeStream.format("delta").outputMode("append").option("checkpointLocation", "someLocation").toTable("somedeltatable")

The job is running ok for the first few days, I observe the metric of eventhub, inbound and outbound traffic are about the same. Then I see the outbound traffic seems to be capped, something like in the figure:

figure 1At first I thought maybe the spark cluster is stressed, I checked the metric, both cpu and memory usage are well under 50%, the cluster is not overloaded at all.

I then thought maybe eventhub is capping the outbound traffic for some reason, so I increased 'throughput unit' of eventhub, but it didn't make any difference.

I'm thinking, if the job ran ok for the first few days, then there shouldn't be any major issue code wise. If the job cluster is not stressed, then it's not about scaling issue.

How should I approach this problem? Any pointer or direction would be appreciated.

1 REPLY 1

Noopur_Nigam
Valued Contributor II
Valued Contributor II

Hi @Databricks User10293847​ You can try using auto-inflate and let the TU increase automatically. The feature then scales automatically to the maximum limit of TUs you need, depending on the increase in your traffic. You can check the below doc:

https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-auto-inflate

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.