cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

java.lang.OutOfMemoryError: GC overhead limit exceeded. [ solved ]

sarvesh
Contributor III

solution :-

i don't need to add any executor or driver memory all i had to do in my case was add this : - option("maxRowsInMemory", 1000).

Before i could n't even read a 9mb file now i just read a 50mb file without any error.

{

val df = spark.read

.format("com.crealytics.spark.excel").

option("maxRowsInMemory", 1000).

option("header", "true").

load("data/12file.xlsx")

}

I am trying to read a 8mb excel file,

i am getting this error.

i use intellij with spark 2.4.4

scala 2.12.12

and jdk 1.8

this is my code : -

val conf = new SparkConf()

.set("spark.driver.memory","4g")

.set("spark.executor.memory", "6g")

// .set("spark.executor.cores", "2")

val spark = SparkSession

.builder

.appName("trimTest")

.master("local[*]")

.config(conf)

.getOrCreate()

val df = spark.read

.format("com.crealytics.spark.excel").

option("header", "true").

load("data/12file.xlsx")

Now, these are my spark ui screenshots,

can you tell me what is the main issue and how can i increase the job executor memory.

edit spark ui 2edit spark ui 1 

stack :-

java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.lang.Class.newReflectionData(Class.java:2511)

at java.lang.Class.reflectionData(Class.java:2503)

at java.lang.Class.privateGetDeclaredConstructors(Class.java:2660)

at java.lang.Class.getConstructor0(Class.java:3075)

at java.lang.Class.newInstance(Class.java:412)

at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:403)

at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)

at java.security.AccessController.doPrivileged(Native Method)

at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)

at sun.reflect.MethodAccessorGenerator.generateMethod(MethodAccessorGenerator.java:75)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:53)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)

at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)

at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)

at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)

at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)

at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)

at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)

at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)

at javax.management.StandardMBean.getAttribute(StandardMBean.java:372)

at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)

at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)

at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:122)

at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:167)

at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:258)

at com.sun.proxy.$Proxy8.getMemoryUsed(Unknown Source)

at org.apache.spark.metrics.MBeanExecutorMetricType.getMetricValue(ExecutorMetricType.scala:67)

at org.apache.spark.metrics.SingleValueExecutorMetricType.getMetricValues(ExecutorMetricType.scala:46)

at org.apache.spark.metrics.SingleValueExecutorMetricType.getMetricValues$(ExecutorMetricType.scala:44)

at org.apache.spark.metrics.MBeanExecutorMetricType.getMetricValues(ExecutorMetricType.scala:60)

1 ACCEPTED SOLUTION

Accepted Solutions

Thank You i just found a solution, and i have mentioned it in my question to, while reading my file all i had to do was add this,

option("maxRowsInMemory", 1000).

View solution in original post

18 REPLIES 18

-werners-
Esteemed Contributor III

I doubt it is the 8 MB file.

What happens if you do not set any memory parameter at all? (use the defaults)

it is an 8.5 mb xlsx file with 100k rows of data,

i get the same gc overhead limit exceeded error without addin any parameter

-werners-
Esteemed Contributor III

My guess is indeed a config issue as in your spark script you don't seem to do any action (spark is lazy evaluated).

As you run Spark locally, chances are the JVM cannot allocate enough RAM for it to run succesfully.

Can you check the docs:

https://spark.apache.org/docs/2.4.4/tuning.html#garbage-collection-tuning

yes i just went through it, and from what i understood i need to increase the heap space, but increasing it at run time with intellij is not working.

-werners-
Esteemed Contributor III

can you try with local[2] instead of local[*]?

And also beef up the driver memory to like 90% of your RAM.

As you run in local mode, the driver and the executor all run in the same process which is controlled by driver memory.

So you can skip the executor params.

i did it is still the same, there is something else that i am missing here, and my memory consumed was 7gb out of 8gb available right now.

-werners-
Esteemed Contributor III

you don't do anything else but the spark.read.excel, right?

i am doing df.show(),

nothing else

-werners-
Esteemed Contributor III

weird, when I run spark locally, I just install it, do not configure any executor and it just works.

Did you define any executors by any chance?

no i did n't i have also just installed it, i think this is my machines issue or something which i have not done right.

Hubert-Dudek
Esteemed Contributor III

can you try without:

.set("spark.driver.memory","4g")

.set("spark.executor.memory", "6g")

It is clearly show that there is no 4gb free on driver and 6gb free on executor (you can share hardware cluster details also).

You can not also allocate 100% for spark usually as there is also other processes.

Automatic settings are recommended.

Screenshot from 2021-11-23 17-30-32i tried to read it without these configs i got the same error ( gc overhead limit ), and i am running it locally these are my specifications.

Hubert-Dudek
Esteemed Contributor III

It seems that you have only 8GB ram (probably 4-6 GB is needed for system at least) but you allocate 10GB for spark (4 GB driver + 6 GB executor).

You can allocate max in my opinion 2GB all together if your RAM is 8 GB. Maybe even 1GB as there can be also spikes in system processes.

Easier would be with docker as than you allocate your machine permanent number of ram and than spark can consume exact amount.

-werners-
Esteemed Contributor III

Yea that will be it. 8GB can run spark, but I'd go for no more than 3GB, 2GB on the safe side.

It looks like an ubuntu install so that is not as resource hungry than windows but 8GB is not much.

For tinkering around, I always go for Docker (or a VM).

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group