<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: GC Driver Error in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31061#M22589</link>
    <description>&lt;P&gt;Did this JDBC from Databricks to Tableau Online work with a simple query?&lt;/P&gt;&lt;P&gt;Example of simple:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;SELECT 'Hello World';&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 06 Oct 2022 18:51:15 GMT</pubDate>
    <dc:creator>Dooley</dc:creator>
    <dc:date>2022-10-06T18:51:15Z</dc:date>
    <item>
      <title>GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31040#M22568</link>
      <description>&lt;P&gt;I am using a cluster in databricks to connect to a Tableau workbook through the JDBC connector. My Tableau workbook has been unable to load due to resources not being available through the data connection. I went to look at the driver log for my cluster and I see Full GC (Ergonomics) errors and Full GC Allocation errors. How do I resolve this? I've tried increasing the storage of my driver and worker by changing them in my cluster but that didn't fix it. &lt;/P&gt;</description>
      <pubDate>Fri, 23 Sep 2022 19:43:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31040#M22568</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-23T19:43:14Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31041#M22569</link>
      <description>&lt;P&gt;Well you should be using the &lt;A href="https://docs.databricks.com/integrations/bi/tableau.html?_ga=2.56335593.379196772.1664208222-1640429557.1654010074" alt="https://docs.databricks.com/integrations/bi/tableau.html?_ga=2.56335593.379196772.1664208222-1640429557.1654010074" target="_blank"&gt;Databricks ODBC connector&lt;/A&gt; and not a JDBC connector for Tableau. Let me know when you make that switch if that error keeps happening.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Sep 2022 21:14:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31041#M22569</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-09-26T21:14:58Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31042#M22570</link>
      <description>&lt;P&gt;I downloaded the ODBC connector from here: &lt;A href="https://www.databricks.com/spark/odbc-drivers-download?_ga=2.75890453.1274928885.1664211300-1388717875.1663615888" target="test_blank"&gt;https://www.databricks.com/spark/odbc-drivers-download?_ga=2.75890453.1274928885.1664211300-1388717875.1663615888&lt;/A&gt; . Now how do I use this with my cluster? I was connecting to the cluster from Tableau through the server host name and HTTP path already to the JDBC so I am just wondering how I connect to the ODBC instead of the JDBC.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2022 16:41:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31042#M22570</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-27T16:41:56Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31043#M22571</link>
      <description>&lt;P&gt;I now also downloaded an ODBC manager and tried creating a User DSN in it but am not too sure where these helpers are going or how to use them. I am following the steps given here: &lt;A href="https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#odbc-driver-guide" alt="https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#odbc-driver-guide" target="_blank"&gt;https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#odbc-driver-guide&lt;/A&gt; .&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2022 20:05:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31043#M22571</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-27T20:05:23Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31044#M22572</link>
      <description>&lt;P&gt;Sorry for the constant updates, but I have successfully created an ODBC connector using the Simba Spark Driver. It has my databricks cluster host name, http path, port number and, for authentication, my databricks login info. Now that this is established, how do I get Tableau Online (not Tableau Desktop) to connect using this ODBC stuff and not the JDBC stuff causing the garbage collection error.&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1461iDFA25B69F9CD9790/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2022 20:28:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31044#M22572</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-27T20:28:56Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31045#M22573</link>
      <description>&lt;P&gt;You are going to publish workbooks from Tableau Desktop to Tableau Online. &lt;A href="https://docs.databricks.com/integrations/bi/tableau.html#publish-and-refresh-a-workbook-on-tableau-online" alt="https://docs.databricks.com/integrations/bi/tableau.html#publish-and-refresh-a-workbook-on-tableau-online" target="_blank"&gt;These are the instructions.&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2022 21:26:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31045#M22573</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-09-27T21:26:58Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31046#M22574</link>
      <description>&lt;P&gt;I don't have Tableau Desktop, I only have access to Tableau Online.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Sep 2022 21:34:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31046#M22574</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-27T21:34:40Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31047#M22575</link>
      <description>&lt;P&gt;Okay, what is your cluster configuration? Did you try a bigger cluster and had the same problem? When you ran the connector, did you look at your metrics for your cluster to see the memory usage? What do you see? Also are you using the spot or on demand cluster? What DRT version are you using? Are you using a Unity Catalog enabled cluster?&lt;/P&gt;</description>
      <pubDate>Wed, 28 Sep 2022 19:32:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31047#M22575</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-09-28T19:32:24Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31048#M22576</link>
      <description>&lt;P&gt;I am using a i3.xlarge worker and i3.2xlarge driver cluster. I have played around with the larger memory sizes of the cluster and still get the same issue. I don't know how to use to ODBC connector but when I was connecting before (I guess to the JDBC) using the server host name and http path in tableau, I get errors like in the picture in my cluster's driver log. &lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1462iFC520B3DFBEFB37A/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;And it just keeps repeating the GC Allocation Failure errors and returns a Full GC Ergonomics error at some point too. When looking at the cluster's metrics every 15 minutes it's only showing about 230 KB each time. I'm not sure which type of cluster I am using in terms of spot or on demand, but my "Instance" field in configuration is blank and through my research it looks like it would appear there. My DRT is 11.2 And I don't think it is a Unity Catalog cluster; I have not used any AWS cloud capabilities with this cluster.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Sep 2022 19:49:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31048#M22576</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-28T19:49:08Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31049#M22577</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image.png"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1467i636BEDD7F324A92E/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Sep 2022 20:36:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31049#M22577</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-28T20:36:27Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31050#M22578</link>
      <description>&lt;P&gt;One thing I would try is to have more workers with memory-optimized instances with larger memories to see if that will fixed it. I know you played with it a bit, but maybe with more workers &amp;amp; larger memory that might fix this. But let's take a look at the Spark UI a bit to try to troubleshoot.&lt;/P&gt;&lt;P&gt;(1) Under Stages of the job that ran that created the error - under the summary metrics, do you see data spill? If you do not see it there then you can see data spill in the SQL tab where you find the SQL query associated to the job number with the error and then you can click the query to see the DAG. You can click the + in the boxes to see the write out and see if you see a data spill in there.&lt;/P&gt;&lt;P&gt;(2) You go to Spark UI and then go to JDBC/ODBC connector, do you see data leaving? &lt;/P&gt;&lt;P&gt;(3) Also under stages, can you take a screenshot of what you see for this job? Can you sort by shuffle read? &lt;/P&gt;&lt;P&gt;(4) Do you see anything cached in storage under the "Storage" tab?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So there is a reference to PSYoungGen, I believe it didn't have enough to allocate memory for possible a large object and thus a GC was triggered by allocation failure. Did this happen multiple times in a 10s interval?&lt;/P&gt;</description>
      <pubDate>Fri, 30 Sep 2022 00:49:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31050#M22578</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-09-30T00:49:18Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31051#M22579</link>
      <description>&lt;P&gt;I increased the minimum amount of worker out of 8 from 2 to 7. &lt;/P&gt;&lt;P&gt;(1) I see this under Stages in Spark UI, doesn't look to show a data spill. &lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1463iB00B2BE7857E99A6/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;Then in the SQL tab, its just blank:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1470iC34453AD89969469/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is as far as I got for now before the Spark UI tab itself is just blank and not loading anything.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1466i27D1F2BDE39DD967/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Sep 2022 16:44:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31051#M22579</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-30T16:44:53Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31052#M22580</link>
      <description>&lt;P&gt;Once restarting the cluster I started to see the information:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(1) Attached are what I see in the Stages and SQL tab. Doesn't look like any data spill. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(2) How would I see data leaving? I just see session statistics and SQL statistics:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1459iEEA2028D76874CC2/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;(3) I also attached an image (in a word doc because it was too big to paste here) of what I see under Stages tab sorted by Shuffle Read&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;(4) Here is what is cached under Storage tab&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1501iDFAED09EAC219BEC/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And here is an image of the PSYoungGen errors where there are multiple "Times" for each error. sys (system) says 2 seconds, then real says about 50 seconds. This is a lot to understand and look at so forgive me if I am not providing you with the correct information. &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1468iD802263D77B43273/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Sep 2022 17:13:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31052#M22580</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-30T17:13:04Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31053#M22581</link>
      <description />
      <pubDate>Fri, 30 Sep 2022 17:13:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31053#M22581</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-30T17:13:22Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31054#M22582</link>
      <description />
      <pubDate>Fri, 30 Sep 2022 17:13:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31054#M22582</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-09-30T17:13:31Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31055#M22583</link>
      <description>&lt;P&gt;Thanks for the files. &lt;/P&gt;&lt;P&gt;(1) So in this Databricks Shell - Details for Query file, in the graph you need to expand the blocks and see if you see any data spill in there.&lt;/P&gt;&lt;P&gt;(2) Can you please submit the query that you were doing that ran into this problem? How much data are you running this on? Is that data partitioned by you in a custom way?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Side note: I did see in one of the stages you are doing a count distinct and I wanted to draw your attention to a &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/approx_count_distinct" alt="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/approx_count_distinct" target="_blank"&gt;Approx Count Distinct SQL&lt;/A&gt; function.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Oct 2022 17:33:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31055#M22583</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-10-03T17:33:13Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31056#M22584</link>
      <description>&lt;P&gt;I tried to get these answers for you but all the spark UI tabs are coming up blank/default again. I've tried restarting my cluster to solve the issue like before but no dice. Is it possible I can give you permission to look at my cluster?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Oct 2022 19:41:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31056#M22584</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-10-03T19:41:35Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31057#M22585</link>
      <description>&lt;P&gt;Usually it is blank because the page is still loading. You should try to wait to see it comes up. Plus the cluster should be on and active &amp;amp; you just run the job to see it in Spark UI. &lt;/P&gt;</description>
      <pubDate>Mon, 03 Oct 2022 22:09:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31057#M22585</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-10-03T22:09:42Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31058#M22586</link>
      <description>&lt;P&gt;Also can you run the command in tableau that caused that issue and then look at the cluster to identify the job number that is associated to that effort? I can see the SQL queries in your stages tab but they are cut off, so I would like to see the full queries that are made that give you this type of error.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Oct 2022 22:12:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31058#M22586</guid>
      <dc:creator>Dooley</dc:creator>
      <dc:date>2022-10-03T22:12:52Z</dc:date>
    </item>
    <item>
      <title>Re: GC Driver Error</title>
      <link>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31059#M22587</link>
      <description>&lt;P&gt;I thought the query was select * from default.salesforce_export_1_explorium_15sept2022 but it is &lt;U&gt;a lot&lt;/U&gt; more. Attached is the 878 pages of word document it took to copy and paste the query. The table I am asking select * from is only 700KB. The job ids associated with this ridiculous query are 293, 294, and 295&lt;/P&gt;</description>
      <pubDate>Tue, 04 Oct 2022 00:18:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/gc-driver-error/m-p/31059#M22587</guid>
      <dc:creator>aschiff</dc:creator>
      <dc:date>2022-10-04T00:18:12Z</dc:date>
    </item>
  </channel>
</rss>

