ā03-15-2023 12:09 AM
In Databricks, using 11.3 ML runtime give different results when using general purpose vs memory-optimized workers. I used SARIMAX and to forecast the results but Iām getting different results when I change the driver and worker types to this options. Does anyone know why am I getting this issue? Thanks!
ā03-24-2023 11:35 PM
@Kevin Kimā :
The difference in results when using different driver and worker types in Databricks can be due to a number of factors. Here are a few possible explanations:
ā03-26-2023 08:54 AM
Hi @Suteja Kanuriā,
Thanks for the response!
I get SARIMAX could be computationally expensive, however, I'm only using up to 9 data points to train on, so I personally think it wouldn't be an issue. But I could be wrong. However, I do see that SARIMAX has different numerically solvers, and the default one is lbfgs. Do you think this is an issue?
Also, could you elaborate more on network latency?
Thanks!
ā04-01-2023 09:01 PM
@Kevin Kimā : Hope the below answer gives you some pointers to think, test and try, implement.
The choice of numerical solver may affect the speed and accuracy of the optimization process when fitting the SARIMAX model. The default solver, lbfgs, is a gradient-based optimization method that is known to work well for small datasets with relatively few parameters to estimate. However, for larger datasets or models with more parameters, it may be necessary to use a different solver, such as Newton-Raphson or BFGS, to achieve better convergence and faster performance.
Regarding network latency, this refers to the time it takes for data to travel between your Databricks cluster and any other external data sources or services that your code may be accessing. If your code is making frequent or large data requests to an external source, network latency can become a significant bottleneck that affects the overall performance of your computations. To mitigate this, you may want to consider optimizing your code to reduce the number of data requests or to preprocess and cache data locally in your Databricks cluster. Additionally, you may want to consider using a higher-performing network or optimizing your network configuration to reduce latency.
ā03-25-2023 03:49 AM
Hi @Kevin Kimā
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.