cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What are Best Practices for Spark streaming in Databricks

Srikanth_Gupta_
Valued Contributor

What are best practices for Spark streaming in Databricks

  1. is it good idea to consume multiple topics in one streaming job
  2. is Auto scaling recommended for spark streaming
  3. How many worker nodes we should choose for streaming job
  4. When should we run OPTIMIZE for continuously streaming topics
  5. any other things to consider to implement streaming jobs with high throughput
2 REPLIES 2

User16826994223
Honored Contributor III

What are best practices for Spark streaming in Databricks

  1. is it good idea to consume multiple topics in one streaming job - Yes it is fine, we can create a fair pool and provide the infrastictrue to each stream so that it does not intervene between each other
  2. is Auto scaling recommended for spark streaming - Nope
  3. How many worker nodes we should choose for streaming job -Per partition one core
  4. When should we run OPTIMIZE for continuously streaming topics - Any time
  5. any other things to consider to implement streaming jobs with high throughput - Compute VM s are preferred as node

craig_ng
New Contributor III

See our docs for other considerations when deploying a production streaming job.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.