cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Different results in Databricks using SARIMAX

ckwan48
New Contributor III

In Databricks, using 11.3 ML runtime give different results when using general purpose vs memory-optimized workers. I used SARIMAX and to forecast the results but Iā€™m getting different results when I change the driver and worker types to this options. Does anyone know why am I getting this issue? Thanks!

4 REPLIES 4

Anonymous
Not applicable

@Kevin Kimā€‹ :

The difference in results when using different driver and worker types in Databricks can be due to a number of factors. Here are a few possible explanations:

  1. Resource allocation: General purpose and memory-optimized workers may have different amounts of CPU and memory resources allocated to them. If the SARIMAX model requires more resources than are available on the general purpose worker, it may perform slower or produce different results. Conversely, if the memory-optimized worker has more resources than needed, it may not significantly improve performance.
  2. Parallelism: SARIMAX is a computationally intensive algorithm and can benefit from parallelism. Memory-optimized workers may have more cores available, allowing the algorithm to be parallelized across more CPU cores, resulting in faster processing times and potentially different results.
  3. Network latency: Databricks workloads involve moving data across the network between the driver and workers. If the network latency is high, this can affect the performance of the SARIMAX algorithm and lead to different results.

ckwan48
New Contributor III

Hi @Suteja Kanuriā€‹,

Thanks for the response!

I get SARIMAX could be computationally expensive, however, I'm only using up to 9 data points to train on, so I personally think it wouldn't be an issue. But I could be wrong. However, I do see that SARIMAX has different numerically solvers, and the default one is lbfgs. Do you think this is an issue?

Also, could you elaborate more on network latency?

Thanks!

Anonymous
Not applicable

@Kevin Kimā€‹ : Hope the below answer gives you some pointers to think, test and try, implement.

The choice of numerical solver may affect the speed and accuracy of the optimization process when fitting the SARIMAX model. The default solver, lbfgs, is a gradient-based optimization method that is known to work well for small datasets with relatively few parameters to estimate. However, for larger datasets or models with more parameters, it may be necessary to use a different solver, such as Newton-Raphson or BFGS, to achieve better convergence and faster performance.

Regarding network latency, this refers to the time it takes for data to travel between your Databricks cluster and any other external data sources or services that your code may be accessing. If your code is making frequent or large data requests to an external source, network latency can become a significant bottleneck that affects the overall performance of your computations. To mitigate this, you may want to consider optimizing your code to reduce the number of data requests or to preprocess and cache data locally in your Databricks cluster. Additionally, you may want to consider using a higher-performing network or optimizing your network configuration to reduce latency.

Anonymous
Not applicable

Hi @Kevin Kimā€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.