topic Re: Garbage Collection optimization in Data Engineering

Garbage Collection optimization

User16826994223 — Tue, 22 Jun 2021 13:08:09 GMT

I have a case where garbage collection is taking much time and I want to optimize it for better performance

Re: Garbage Collection optimization

User16826994223 — Tue, 22 Jun 2021 13:12:26 GMT

You can use smaller instances with less ram than the VMs with higher ram, However there will be a trade off if there are lots of shuffle involve in the operation, because more small memory vms will increase the shuffling operation time

Re: Garbage Collection optimization

sean_owen — Tue, 22 Jun 2021 16:06:59 GMT

You can also tune the JVM's GC parameters directly, if you mean the pauses are too long. Set "spark.executor.extraJavaOptions", but it does require knowing a thing or two about how to tune for what performance goal.