cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Different results in Databricks using SARIMAX

ckwan48
New Contributor III

In Databricks, using 11.3 ML runtime give different results when using general purpose vs memory-optimized workers. I used SARIMAX and to forecast the results but Iโ€™m getting different results when I change the driver and worker types to this options. Does anyone know why am I getting this issue? Thanks!

4 REPLIES 4

Anonymous
Not applicable

@Kevin Kimโ€‹ :

The difference in results when using different driver and worker types in Databricks can be due to a number of factors. Here are a few possible explanations:

  1. Resource allocation: General purpose and memory-optimized workers may have different amounts of CPU and memory resources allocated to them. If the SARIMAX model requires more resources than are available on the general purpose worker, it may perform slower or produce different results. Conversely, if the memory-optimized worker has more resources than needed, it may not significantly improve performance.
  2. Parallelism: SARIMAX is a computationally intensive algorithm and can benefit from parallelism. Memory-optimized workers may have more cores available, allowing the algorithm to be parallelized across more CPU cores, resulting in faster processing times and potentially different results.
  3. Network latency: Databricks workloads involve moving data across the network between the driver and workers. If the network latency is high, this can affect the performance of the SARIMAX algorithm and lead to different results.

ckwan48
New Contributor III

Hi @Suteja Kanuriโ€‹,

Thanks for the response!

I get SARIMAX could be computationally expensive, however, I'm only using up to 9 data points to train on, so I personally think it wouldn't be an issue. But I could be wrong. However, I do see that SARIMAX has different numerically solvers, and the default one is lbfgs. Do you think this is an issue?

Also, could you elaborate more on network latency?

Thanks!

Anonymous
Not applicable

@Kevin Kimโ€‹ : Hope the below answer gives you some pointers to think, test and try, implement.

The choice of numerical solver may affect the speed and accuracy of the optimization process when fitting the SARIMAX model. The default solver, lbfgs, is a gradient-based optimization method that is known to work well for small datasets with relatively few parameters to estimate. However, for larger datasets or models with more parameters, it may be necessary to use a different solver, such as Newton-Raphson or BFGS, to achieve better convergence and faster performance.

Regarding network latency, this refers to the time it takes for data to travel between your Databricks cluster and any other external data sources or services that your code may be accessing. If your code is making frequent or large data requests to an external source, network latency can become a significant bottleneck that affects the overall performance of your computations. To mitigate this, you may want to consider optimizing your code to reduce the number of data requests or to preprocess and cache data locally in your Databricks cluster. Additionally, you may want to consider using a higher-performing network or optimizing your network configuration to reduce latency.

Anonymous
Not applicable

Hi @Kevin Kimโ€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group