Error while optimizing the table .
Failure of InSet.sql for UTF8String collection
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2022 01:56 PM
count of the table : 1125089 for october month data , So I am optimizing the table.
optimize table where batchday >="2022-10-01" and batchday<="2022-10-31"
I am getting error like : GC overhead limit exceeded
at org.apache.spark.unsafe.types.UTF8String.fromBytes(UTF8String.java:136)
I have increased my driver and executor worker nodes from 2 worker nodes to 10 worker nodes with memory size increased from 32 GB .
When i run other optimize on other batchday i am not finding any issue.
Could you tell me why UTFString.fromBytes exception comes during optimizing a partition of data?
- Labels:
-
Executor Memory
-
Table
-
Worker Nodes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2022 07:23 AM
Hi @Nagini Sitaraman
Can you try forcing the GC to unload the objects in the driver's memory. That should solve your issue.
code to do that :
spark.catalog.clearCache()
for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items():
rdd.unpersist()
print("Unloaded {} rdd".format(id))
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2022 09:11 AM
Hi Uma,
I am running just a optimize sql script .
optimize table where batchday >="2022-10-01" and batchday<="2022-10-31"
In this case how to handle this .
There are 93 files present in this batch .
It takes more than 5 hours to optimize .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-30-2023 08:41 AM
Hi @Nagini Sitaraman To understand the issue better I would like to get some more information.
Does the error occur at the driver side or executor side?
Can you please share the full error stack trace?
You may need to check the spark UI to find where the bottleneck is. e.g. which phase causes the issue, it’s memory or other issue?