Anonymous
Not applicable

@Kevin Kim​ :

The difference in results when using different driver and worker types in Databricks can be due to a number of factors. Here are a few possible explanations:

  1. Resource allocation: General purpose and memory-optimized workers may have different amounts of CPU and memory resources allocated to them. If the SARIMAX model requires more resources than are available on the general purpose worker, it may perform slower or produce different results. Conversely, if the memory-optimized worker has more resources than needed, it may not significantly improve performance.
  2. Parallelism: SARIMAX is a computationally intensive algorithm and can benefit from parallelism. Memory-optimized workers may have more cores available, allowing the algorithm to be parallelized across more CPU cores, resulting in faster processing times and potentially different results.
  3. Network latency: Databricks workloads involve moving data across the network between the driver and workers. If the network latency is high, this can affect the performance of the SARIMAX algorithm and lead to different results.