Thank you very much for your detailed analysis and helpful recommendations.
We have reviewed your suggestions, and I’d like to share a quick update:
We have already tried most of the mitigation strategies you mentioned — including increasing driver memory, tuning the garbage collector, refactoring memory-heavy operations, and analyzing driver metrics. However, we have not yet explored off-heap memory allocation (item #4), and we will consider testing this next.
Also, it's worth noting that this issue started occurring only after we switched from GKE to GCE. Previously, our pipelines were running smoothly without any GC-related performance degradation.
Once again, we appreciate your insights and support.
Kind regards,
Hung
Regards,
Hung Nguyen