cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to improve Spark Streaming writer Input Rate and Processing rate?

RengarLee
Contributor

Hi!

I have many questions about Spark Streaming and Evnethub。

Can you help me?

Q1:How to improve Spark Streaming writer Input Rate and Processing rate?

I connect Azure Eventhubs using Spark Streaming(Azure Databricks), but I found if I use display, this input rate is very quick, if I use writer is very Slow. the result this Prcture.1, the code this Picture.2 and Picture.3. I want to improve the writer input rate and processing rate to the extent that the outgoing bytes are Greater than the Incoming bytes in the event hub, like the display.

What should I do?  

Q2: setMaxEventsPerTrigger not  equal to  numInputRow?

I set 10000 to setMaxEventsPerTrigger on eventhubsConf, but why numInputRow inside RawData is 1000, like the Prcture.5.setMaxEventsPerTrigger not  equal to  numInputRow?

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Moderator
Moderator

Hi @Rengar Lee​ ,

How many Eventhubs partitions are you reading from? check your Ganglia UI to check your cluster utilization. also, whats the time it takes to write the data to the sink? you can get the query metrics from the Spark logs.

View solution in original post

5 REPLIES 5

jose_gonzalez
Moderator
Moderator

Hi @Rengar Lee​ ,

How many Eventhubs partitions are you reading from? check your Ganglia UI to check your cluster utilization. also, whats the time it takes to write the data to the sink? you can get the query metrics from the Spark logs.

How many Eventhubs partitions are you reading from?

Only 1 partition.

Check your Ganglia UI to check your cluster utilization

Prcture.1

What's the time it takes to write the data to the sink?

Prcture.2

I found that no matter how much setMaxEventsPerTrigger the numInputRow is 1000.

Can't finish writing in 1 second, which causes a delay for the next write, so event hubs' outgoing bytes are Low.

if I can set numInputRow to exceed 1000, I think the question can resolve.

Event​Hubs is in Prcture3.

 

 

RengarLee
Contributor

setMaxEventsPerTrigger not equal to numInputRow is my problem