cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to improve Spark Streaming writer Input Rate and Processing rate?

RengarLee
Contributor

Hi!

I have many questions about Spark Streaming and Evnethubใ€‚

Can you help me?

Q1:How to improve Spark Streaming writer Input Rate and Processing rate?

I connect Azure Eventhubs using Spark Streaming(Azure Databricks), but I found if I use display, this input rate is very quick, if I use writer is very Slow. the result this Prcture.1, the code this Picture.2 and Picture.3. I want to improve the writer input rate and processing rate to the extent that the outgoing bytes are Greater than the Incoming bytes in the event hub, like the display.

What should I do?  

Q2: setMaxEventsPerTrigger not  equal to  numInputRow?

I set 10000 to setMaxEventsPerTrigger on eventhubsConf, but why numInputRow inside RawData is 1000, like the Prcture.5.setMaxEventsPerTrigger not  equal to  numInputRow?

1 ACCEPTED SOLUTION

Accepted Solutions

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Rengar Leeโ€‹ ,

How many Eventhubs partitions are you reading from? check your Ganglia UI to check your cluster utilization. also, whats the time it takes to write the data to the sink? you can get the query metrics from the Spark logs.

View solution in original post

5 REPLIES 5

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @Rengar Leeโ€‹ ,

How many Eventhubs partitions are you reading from? check your Ganglia UI to check your cluster utilization. also, whats the time it takes to write the data to the sink? you can get the query metrics from the Spark logs.

How many Eventhubs partitions are you reading from?

Only 1 partition.

Check your Ganglia UI to check your cluster utilization

Prcture.1

What's the time it takes to write the data to the sink?

Prcture.2

โ€‹

I found that no matter how much setMaxEventsPerTrigger the numInputRow is 1000.

Can't finish writing in 1 second๏ผŒ which causes a delay for the next write๏ผŒ so event hubs' outgoing bytes are Low.

if I can set numInputRow to exceed 1000๏ผŒ I think the question can resolve.

Eventโ€‹Hubs is in Prcture3.

โ€‹

โ€‹

โ€‹

 

 

RengarLee
Contributor

setMaxEventsPerTrigger not equal to numInputRow is my problem

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group