<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: GC (Metadata GC Threshold)  issue in Data Governance</title>
    <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31690#M912</link>
    <description>&lt;P&gt;hi @Chandan Angadi​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Are you still affected by this error message? please let us know if we can help.&lt;/P&gt;</description>
    <pubDate>Tue, 05 Apr 2022 23:53:35 GMT</pubDate>
    <dc:creator>jose_gonzalez</dc:creator>
    <dc:date>2022-04-05T23:53:35Z</dc:date>
    <item>
      <title>GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31684#M906</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am facing the GC metadata issue while performing distributed computing on Spark. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2022-01-13T22:02:28.467+0000: [GC (Metadata GC Threshold) [PSYoungGen: 458969K-&amp;gt;18934K(594944K)] 458969K-&amp;gt;18958K(1954816K), 0.0144028 secs] [Times: user=0.05 sys=0.01, real=0.02 secs] &lt;/P&gt;&lt;P&gt;2022-01-13T22:02:28.482+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 18934K-&amp;gt;0K(594944K)] [ParOldGen: 24K-&amp;gt;17853K(823296K)] 18958K-&amp;gt;17853K(1418240K), [Metaspace: 20891K-&amp;gt;20891K(1067008K)], 0.0201195 secs] [Times: user=0.14 sys=0.01, real=0.02 secs] &lt;/P&gt;&lt;P&gt;2022-01-13T22:02:29.459+0000: [GC (Metadata GC Threshold) [PSYoungGen: 432690K-&amp;gt;84984K(594944K)] 450544K-&amp;gt;105009K(1418240K), 0.0226140 secs] [Times: user=0.17 sys=0.05, real=0.03 secs] &lt;/P&gt;&lt;P&gt;2022-01-13T22:02:29.481+0000: [Full GC (Metadata GC Threshold) [PSYoungGen: 84984K-&amp;gt;0K(594944K)] [ParOldGen: 20025K-&amp;gt;91630K(1360384K)] 105009K-&amp;gt;91630K(1955328K), [Metaspace: 34943K-&amp;gt;34943K(1079296K)], 0.0307833 secs] [Times: user=0.13 sys=0.07, real=0.03 secs] &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Cluster config :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Nodes - r5.4xlarge (128 GB, 16 cores)&lt;/P&gt;&lt;P&gt;8 Worker nodes &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Spark Config :&lt;/P&gt;&lt;P&gt;&amp;nbsp;spark_home_set("/databricks/spark")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config &amp;lt;- spark_config()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.sql.shuffle.partitions = 480&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.executor.cores = 5&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.executor.memory = "30G"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.rpc.message.maxSize = 1945&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.executor.instances = 24&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.driver.memory = "30G"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.sql.execution.arrow.sparkr.enabled = TRUE&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.driver.maxResultSize = 0&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;options(sparklyr.sanitize.column.names.verbose = TRUE)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;options(sparklyr.verbose = TRUE)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;options(sparklyr.na.omit.verbose = TRUE)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;options(sparklyr.na.action.verbose = TRUE)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;options(java.parameters = "-Xmx8000m")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;sc &amp;lt;- spark_connect(method = "databricks", master = "yarn-client", config = config, spark_home = "/databricks/spark")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please let me know how to fix this issue. Tried different approaches but I am getting same error all the time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Chandan &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jan 2022 08:16:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31684#M906</guid>
      <dc:creator>chandan_a_v</dc:creator>
      <dc:date>2022-01-14T08:16:32Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31685#M907</link>
      <description>&lt;P&gt;Hi @Kaniz Fatma​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you have any idea regarding this, please let me know.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Chandan &lt;/P&gt;</description>
      <pubDate>Fri, 14 Jan 2022 08:18:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31685#M907</guid>
      <dc:creator>chandan_a_v</dc:creator>
      <dc:date>2022-01-14T08:18:03Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31686#M908</link>
      <description>&lt;P&gt;Can you try to run a test with maximum simplified spark_connect (so just method and spark_home).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Additionally please check following:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Only the following Databricks Runtime versions are supported:&lt;UL&gt;&lt;LI&gt;Databricks Runtime 9.1 LTS ML, Databricks Runtime 9.1 LTS&lt;/LI&gt;&lt;LI&gt;Databricks Runtime 7.3 LTS ML, Databricks Runtime 7.3 LTS&lt;/LI&gt;&lt;LI&gt;Databricks Runtime 6.4 ML, Databricks Runtime 6.4&lt;/LI&gt;&lt;LI&gt;Databricks Runtime 5.5 LTS ML, Databricks Runtime 5.5 LTS&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;The minor version of your client Python installation must be the same as the minor Python version of your Azure Databricks cluster. The table shows the Python version installed with each Databricks Runtime.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2178iA989EB4BC3DD8709/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 14 Jan 2022 13:18:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31686#M908</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-01-14T13:18:48Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31687#M909</link>
      <description>&lt;P&gt;Hi @Hubert Dudek​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for the reply, I am running R code. I tried the approach you have mentioned got the same issue. &lt;/P&gt;</description>
      <pubDate>Fri, 14 Jan 2022 15:59:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31687#M909</guid>
      <dc:creator>chandan_a_v</dc:creator>
      <dc:date>2022-01-14T15:59:01Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31689#M911</link>
      <description>&lt;P&gt;Hi @Chandan Angadi​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are you getting any other error or warning message? for example in your log4j or the std error logs?&lt;/P&gt;&lt;P&gt;I would also recommend to run your code with the default values. Without these settings: &amp;nbsp;&lt;/P&gt;&lt;P&gt;  config &amp;lt;- spark_config()&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.sql.shuffle.partitions = 480&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.executor.cores = 5&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.executor.memory = "30G"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.rpc.message.maxSize = 1945&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.executor.instances = 24&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.driver.memory = "30G"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.sql.execution.arrow.sparkr.enabled = TRUE&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;config$spark.driver.maxResultSize = 0&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just to narrow down and identify the the message happens with all the default values or not. Some of these Spark configs are not needed in Databricks, unless you want to fine tune your job. In this case we need to make sure your job runs fine, to have a reference point.&lt;/P&gt;</description>
      <pubDate>Thu, 24 Feb 2022 01:04:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31689#M911</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-02-24T01:04:11Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31690#M912</link>
      <description>&lt;P&gt;hi @Chandan Angadi​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just a friendly follow-up. Are you still affected by this error message? please let us know if we can help.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Apr 2022 23:53:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31690#M912</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-04-05T23:53:35Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31691#M913</link>
      <description>&lt;P&gt;Hey @Chandan Angadi​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope you are doing great!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just checking in. Were you able to resolve your issue? If yes, would you like to mark an answer as best? It would be really helpful for the other members.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2022 16:46:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31691#M913</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-04-28T16:46:44Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31692#M914</link>
      <description>&lt;P&gt;Hi @Vartika Nain​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sorry for the late reply and sorry for others as well had some health issues so couldn't reply early.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;  Yes, the issue got resolved with the following spark config.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;conf = spark_config()&lt;/P&gt;&lt;P&gt;conf$sparklyr.apply.packages &amp;lt;- FALSE&lt;/P&gt;&lt;P&gt;sc &amp;lt;- spark_connect(method = "databricks", config = conf)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 30 Apr 2022 18:18:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31692#M914</guid>
      <dc:creator>chandan_a_v</dc:creator>
      <dc:date>2022-04-30T18:18:59Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31693#M915</link>
      <description>&lt;P&gt;Hi @Jose Gonzalez​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Yes, the issue got resolved with the following spark config.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;conf = spark_config()&lt;/P&gt;&lt;P&gt;conf$sparklyr.apply.packages &amp;lt;- FALSE&lt;/P&gt;&lt;P&gt;sc &amp;lt;- spark_connect(method = "databricks", config = conf)&lt;/P&gt;</description>
      <pubDate>Sat, 30 Apr 2022 18:20:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31693#M915</guid>
      <dc:creator>chandan_a_v</dc:creator>
      <dc:date>2022-04-30T18:20:06Z</dc:date>
    </item>
    <item>
      <title>Re: GC (Metadata GC Threshold)  issue</title>
      <link>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31694#M916</link>
      <description>&lt;P&gt;Hi @Chandan Angadi​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope you are doing well now.&lt;/P&gt;&lt;P&gt;Thanks for getting back to us and sending in your solution. Would you like to mark an answer as best?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 02 May 2022 13:05:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-governance/gc-metadata-gc-threshold-issue/m-p/31694#M916</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-05-02T13:05:35Z</dc:date>
    </item>
  </channel>
</rss>

