cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

What are Best Practices for Spark streaming in Databricks

Srikanth_Gupta_
Valued Contributor

What are best practices for Spark streaming in Databricks

  1. is it good idea to consume multiple topics in one streaming job
  2. is Auto scaling recommended for spark streaming
  3. How many worker nodes we should choose for streaming job
  4. When should we run OPTIMIZE for continuously streaming topics
  5. any other things to consider to implement streaming jobs with high throughput
2 REPLIES 2

User16826994223
Honored Contributor III

What are best practices for Spark streaming in Databricks

  1. is it good idea to consume multiple topics in one streaming job - Yes it is fine, we can create a fair pool and provide the infrastictrue to each stream so that it does not intervene between each other
  2. is Auto scaling recommended for spark streaming - Nope
  3. How many worker nodes we should choose for streaming job -Per partition one core
  4. When should we run OPTIMIZE for continuously streaming topics - Any time
  5. any other things to consider to implement streaming jobs with high throughput - Compute VM s are preferred as node

craig_ng
New Contributor III

See our docs for other considerations when deploying a production streaming job.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group