Databricks

sarvesh · ‎11-22-2021

solution :-

i don't need to add any executor or driver memory all i had to do in my case was add this : - option("maxRowsInMemory", 1000).

Before i could n't even read a 9mb file now i just read a 50mb file without any error.

{

val df = spark.read

.format("com.crealytics.spark.excel").

option("maxRowsInMemory", 1000).

option("header", "true").

load("data/12file.xlsx")

}

I am trying to read a 8mb excel file,

i am getting this error.

i use intellij with spark 2.4.4

scala 2.12.12

and jdk 1.8

this is my code : -

val conf = new SparkConf()

.set("spark.driver.memory","4g")

.set("spark.executor.memory", "6g")

// .set("spark.executor.cores", "2")

val spark = SparkSession

.builder

.appName("trimTest")

.master("local[*]")

.config(conf)

.getOrCreate()

val df = spark.read

.format("com.crealytics.spark.excel").

option("header", "true").

load("data/12file.xlsx")

Now, these are my spark ui screenshots,

can you tell me what is the main issue and how can i increase the job executor memory.

stack :-

java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.lang.Class.newReflectionData(Class.java:2511)

at java.lang.Class.reflectionData(Class.java:2503)

at java.lang.Class.privateGetDeclaredConstructors(Class.java:2660)

at java.lang.Class.getConstructor0(Class.java:3075)

at java.lang.Class.newInstance(Class.java:412)

at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:403)

at sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:394)

at java.security.AccessController.doPrivileged(Native Method)

at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:393)

at sun.reflect.MethodAccessorGenerator.generateMethod(MethodAccessorGenerator.java:75)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:53)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)

at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:193)

at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:175)

at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:117)

at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:54)

at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)

at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)

at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)

at javax.management.StandardMBean.getAttribute(StandardMBean.java:372)

at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)

at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)

at com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:122)

at com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:167)

at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:258)

at com.sun.proxy.$Proxy8.getMemoryUsed(Unknown Source)

at org.apache.spark.metrics.MBeanExecutorMetricType.getMetricValue(ExecutorMetricType.scala:67)

at org.apache.spark.metrics.SingleValueExecutorMetricType.getMetricValues(ExecutorMetricType.scala:46)

at org.apache.spark.metrics.SingleValueExecutorMetricType.getMetricValues$(ExecutorMetricType.scala:44)

at org.apache.spark.metrics.MBeanExecutorMetricType.getMetricValues(ExecutorMetricType.scala:60)

sarvesh · ‎11-25-2021

Thank You i just found a solution, and i have mentioned it in my question to, while reading my file all i had to do was add this,

option("maxRowsInMemory", 1000).

View solution in original post

-werners- · ‎11-22-2021

I doubt it is the 8 MB file.

What happens if you do not set any memory parameter at all? (use the defaults)

sarvesh · ‎11-22-2021

it is an 8.5 mb xlsx file with 100k rows of data,

i get the same gc overhead limit exceeded error without addin any parameter

-werners- · ‎11-22-2021

My guess is indeed a config issue as in your spark script you don't seem to do any action (spark is lazy evaluated).

As you run Spark locally, chances are the JVM cannot allocate enough RAM for it to run succesfully.

Can you check the docs:

https://spark.apache.org/docs/2.4.4/tuning.html#garbage-collection-tuning

sarvesh · ‎11-23-2021

yes i just went through it, and from what i understood i need to increase the heap space, but increasing it at run time with intellij is not working.

-werners- · ‎11-23-2021

can you try with local[2] instead of local[*]?

And also beef up the driver memory to like 90% of your RAM.

As you run in local mode, the driver and the executor all run in the same process which is controlled by driver memory.

So you can skip the executor params.

sarvesh · ‎11-23-2021

i did it is still the same, there is something else that i am missing here, and my memory consumed was 7gb out of 8gb available right now.

-werners- · ‎11-23-2021

you don't do anything else but the spark.read.excel, right?

sarvesh · ‎11-23-2021

i am doing df.show(),

nothing else

-werners- · ‎11-23-2021

weird, when I run spark locally, I just install it, do not configure any executor and it just works.

Did you define any executors by any chance?

sarvesh · ‎11-23-2021

no i did n't i have also just installed it, i think this is my machines issue or something which i have not done right.

Hubert-Dudek · ‎11-23-2021

can you try without:

.set("spark.driver.memory","4g")

.set("spark.executor.memory", "6g")

It is clearly show that there is no 4gb free on driver and 6gb free on executor (you can share hardware cluster details also).

You can not also allocate 100% for spark usually as there is also other processes.

Automatic settings are recommended.

sarvesh · ‎11-23-2021

i tried to read it without these configs i got the same error ( gc overhead limit ), and i am running it locally these are my specifications.

Hubert-Dudek · ‎11-23-2021

It seems that you have only 8GB ram (probably 4-6 GB is needed for system at least) but you allocate 10GB for spark (4 GB driver + 6 GB executor).

You can allocate max in my opinion 2GB all together if your RAM is 8 GB. Maybe even 1GB as there can be also spikes in system processes.

Easier would be with docker as than you allocate your machine permanent number of ram and than spark can consume exact amount.

-werners- · ‎11-23-2021

Yea that will be it. 8GB can run spark, but I'd go for no more than 3GB, 2GB on the safe side.

It looks like an ubuntu install so that is not as resource hungry than windows but 8GB is not much.

For tinkering around, I always go for Docker (or a VM).

Databricks

java.lang.OutOfMemoryError: GC overhead limit exceeded. [ solved ]

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI