cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Error while optimizing the table . Failure of InSet.sql for UTF8String collection

nagini_sitarama
New Contributor III

count of the table : 1125089 for october month data , So I am optimizing the table. 

image.pngoptimize table where batchday >="2022-10-01" and batchday<="2022-10-31"

I am getting error like : GC overhead limit exceeded

    at org.apache.spark.unsafe.types.UTF8String.fromBytes(UTF8String.java:136)

I have increased my driver and executor worker nodes from 2 worker nodes to 10 worker nodes with memory size increased from 32 GB .

When i run other optimize on other batchday i am not finding any issue.

Could you tell me why UTFString.fromBytes exception comes during optimizing a partition of data?

3 REPLIES 3

UmaMahesh1
Honored Contributor III

Hi @Nagini Sitaraman​ 

Can you try forcing the GC to unload the objects in the driver's memory. That should solve your issue.

code to do that :

spark.catalog.clearCache()

for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items():

  rdd.unpersist()

  print("Unloaded {} rdd".format(id))

Cheers

Uma Mahesh D

Hi Uma,

I am running just a optimize sql script .

optimize table where batchday >="2022-10-01" and batchday<="2022-10-31"

In this case how to handle this .

There are 93 files present in this batch .

It takes more than 5 hours to optimize .

Priyanka_Biswas
Esteemed Contributor III
Esteemed Contributor III

Hi @Nagini Sitaraman​ To understand the issue better I would like to get some more information.

Does the error occur at the driver side or executor side?

Can you please share the full error stack trace?

You may need to check the spark UI to find where the bottleneck is. e.g. which phase causes the issue, it’s memory or other issue?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group