User16826994223
Databricks Employee
Databricks Employee

What are best practices for Spark streaming in Databricks

  1. is it good idea to consume multiple topics in one streaming job - Yes it is fine, we can create a fair pool and provide the infrastictrue to each stream so that it does not intervene between each other
  2. is Auto scaling recommended for spark streaming - Nope
  3. How many worker nodes we should choose for streaming job -Per partition one core
  4. When should we run OPTIMIZE for continuously streaming topics - Any time
  5. any other things to consider to implement streaming jobs with high throughput - Compute VM s are preferred as node