cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Structured streaming job sees throughput being capped after running normally for a few days

databricksuser2
New Contributor II

The job (written in PySpark) uses azure eventhub as source and use Databricks delta table as sink. The job is hosted in Azure Databricks.

Transformation part is simple, the message body is converted from bytes to json string, the json string is then added as a new column, something like:

df = spark.readStream.format("eventhubs").load()
df = df.withColumn("stringBody", col("Body").cast(stringType()))
query = df.writeStream.format("delta").outputMode("append").option("checkpointLocation", "someLocation").toTable("somedeltatable")

The job is running ok for the first few days, I observe the metric of eventhub, inbound and outbound traffic are about the same. Then I see the outbound traffic seems to be capped, something like in the figure:

figure 1At first I thought maybe the spark cluster is stressed, I checked the metric, both cpu and memory usage are well under 50%, the cluster is not overloaded at all.

I then thought maybe eventhub is capping the outbound traffic for some reason, so I increased 'throughput unit' of eventhub, but it didn't make any difference.

I'm thinking, if the job ran ok for the first few days, then there shouldn't be any major issue code wise. If the job cluster is not stressed, then it's not about scaling issue.

How should I approach this problem? Any pointer or direction would be appreciated.

1 REPLY 1

Noopur_Nigam
Valued Contributor II
Valued Contributor II

Hi @Databricks User10293847​ You can try using auto-inflate and let the TU increase automatically. The feature then scales automatically to the maximum limit of TUs you need, depending on the increase in your traffic. You can check the below doc:

https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-auto-inflate

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!