<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Error while optimizing the table .
Failure of InSet.sql for UTF8String collection in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18959#M12645</link>
    <description>&lt;P&gt;Hi @Nagini Sitaraman​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you try forcing the GC to unload the objects in the driver's memory.  That should solve your issue. &lt;/P&gt;&lt;P&gt;code to do that :&lt;/P&gt;&lt;P&gt;spark.catalog.clearCache()&lt;/P&gt;&lt;P&gt;for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items():&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;rdd.unpersist()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;print("Unloaded {} rdd".format(id))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;</description>
    <pubDate>Sat, 03 Dec 2022 15:23:36 GMT</pubDate>
    <dc:creator>UmaMahesh1</dc:creator>
    <dc:date>2022-12-03T15:23:36Z</dc:date>
    <item>
      <title>Error while optimizing the table .
Failure of InSet.sql for UTF8String collection</title>
      <link>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18958#M12644</link>
      <description>&lt;P&gt;count of the table : 1125089 for october month data , So I am optimizing the table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1056iFB8CCFB284BEB39C/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;optimize table where batchday &amp;gt;="2022-10-01" and batchday&amp;lt;="2022-10-31"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am getting error like : GC overhead limit exceeded&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;at org.apache.spark.unsafe.types.UTF8String.fromBytes(UTF8String.java:136)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have increased my driver and executor worker nodes from 2 worker nodes to 10 worker nodes with memory size increased from 32 GB .&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When i run other optimize on other batchday i am not finding any issue. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could you tell me why UTFString.fromBytes exception comes during optimizing a partition of data?&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2022 21:56:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18958#M12644</guid>
      <dc:creator>nagini_sitarama</dc:creator>
      <dc:date>2022-12-02T21:56:23Z</dc:date>
    </item>
    <item>
      <title>Re: Error while optimizing the table .
Failure of InSet.sql for UTF8String collection</title>
      <link>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18959#M12645</link>
      <description>&lt;P&gt;Hi @Nagini Sitaraman​&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you try forcing the GC to unload the objects in the driver's memory.  That should solve your issue. &lt;/P&gt;&lt;P&gt;code to do that :&lt;/P&gt;&lt;P&gt;spark.catalog.clearCache()&lt;/P&gt;&lt;P&gt;for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items():&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;rdd.unpersist()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;print("Unloaded {} rdd".format(id))&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;</description>
      <pubDate>Sat, 03 Dec 2022 15:23:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18959#M12645</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-12-03T15:23:36Z</dc:date>
    </item>
    <item>
      <title>Re: Error while optimizing the table .
Failure of InSet.sql for UTF8String collection</title>
      <link>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18960#M12646</link>
      <description>&lt;P&gt;Hi Uma,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am running just a optimize sql script . &lt;/P&gt;&lt;P&gt;optimize table where batchday &amp;gt;="2022-10-01" and batchday&amp;lt;="2022-10-31"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In this case how to handle this . &lt;/P&gt;&lt;P&gt;There are 93 files present in this batch . &lt;/P&gt;&lt;P&gt;It takes more than 5 hours to optimize .&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 08 Dec 2022 17:11:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18960#M12646</guid>
      <dc:creator>nagini_sitarama</dc:creator>
      <dc:date>2022-12-08T17:11:41Z</dc:date>
    </item>
    <item>
      <title>Re: Error while optimizing the table .
Failure of InSet.sql for UTF8String collection</title>
      <link>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18961#M12647</link>
      <description>&lt;P&gt;Hi @Nagini Sitaraman​ To understand the issue better I would like to get some more information. &lt;/P&gt;&lt;P&gt;Does the error occur at the driver side or executor side? &lt;/P&gt;&lt;P&gt;Can you please share the full error stack trace? &lt;/P&gt;&lt;P&gt;You may need to check the spark UI to find where the bottleneck is. e.g. which phase causes the issue, it’s memory or other issue? &lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2023 16:41:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-while-optimizing-the-table-failure-of-inset-sql-for/m-p/18961#M12647</guid>
      <dc:creator>Priyanka_Biswas</dc:creator>
      <dc:date>2023-01-30T16:41:02Z</dc:date>
    </item>
  </channel>
</rss>

