Garbage Collection optimization

User16826994223
Databricks Employee
Databricks Employee

I have a case where garbage collection is taking much time and I want to optimize it for better performance

User16826994223
Databricks Employee
Databricks Employee

You can use smaller instances with less ram than the VMs with higher ram, However there will be a trade off if there are lots of shuffle involve in the operation, because more small memory vms will increase the shuffling operation time

View solution in original post

sean_owen
Databricks Employee
Databricks Employee

You can also tune the JVM's GC parameters directly, if you mean the pauses are too long. Set "spark.executor.extraJavaOptions", but it does require knowing a thing or two about how to tune for what performance goal.