cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

What are Best Practices for Spark streaming in Databricks

Srikanth_Gupta_
Databricks Employee
Databricks Employee

What are best practices for Spark streaming in Databricks

  1. is it good idea to consume multiple topics in one streaming job
  2. is Auto scaling recommended for spark streaming
  3. How many worker nodes we should choose for streaming job
  4. When should we run OPTIMIZE for continuously streaming topics
  5. any other things to consider to implement streaming jobs with high throughput
2 REPLIES 2

User16826994223
Honored Contributor III

What are best practices for Spark streaming in Databricks

  1. is it good idea to consume multiple topics in one streaming job - Yes it is fine, we can create a fair pool and provide the infrastictrue to each stream so that it does not intervene between each other
  2. is Auto scaling recommended for spark streaming - Nope
  3. How many worker nodes we should choose for streaming job -Per partition one core
  4. When should we run OPTIMIZE for continuously streaming topics - Any time
  5. any other things to consider to implement streaming jobs with high throughput - Compute VM s are preferred as node

craig_ng
New Contributor III

See our docs for other considerations when deploying a production streaming job.