<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34656#M25389</link>
    <description>&lt;P&gt;can you please let us know limit of data that can be store in &amp;nbsp;Delta table/Hive table or in Parquet file&lt;/P&gt;</description>
    <pubDate>Mon, 22 Nov 2021 08:49:31 GMT</pubDate>
    <dc:creator>AzureDatabricks</dc:creator>
    <dc:date>2021-11-22T08:49:31Z</dc:date>
    <item>
      <title>Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34651#M25384</link>
      <description>&lt;P&gt;Truncate False not working in Delta table.&amp;nbsp; &lt;/P&gt;&lt;P&gt;df_delta.show(df_delta.count(),False)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Computer size &lt;/P&gt;&lt;P&gt;Single Node - Standard_F4S - 8GB Memory, 4 cores&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How much max data we can persist in Delta table in Parquet file and How fast we can retrieve data.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 07:25:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34651#M25384</guid>
      <dc:creator>AzureDatabricks</dc:creator>
      <dc:date>2021-11-22T07:25:29Z</dc:date>
    </item>
    <item>
      <title>Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34652#M25385</link>
      <description>&lt;P&gt;a record count is very easy: first read the delta table in a DF and then do df.count&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How fast: depends on the cluster and the lineage of the dataframe (what transformations are applied to it).&lt;/P&gt;&lt;P&gt;There is no way to tell.  But a single node cluster with 4 cores will process 8 threads in parallel I believe.&lt;/P&gt;&lt;P&gt;So depending on the amount of data this will return within a few seconds or half an hour or more.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The out of memory error is weird as a record count is stored in the metadata of the table. So it does not take a lot of memory.&lt;/P&gt;&lt;P&gt;What exactly are you trying to do in your code, because it seems you try do process a lot of data locally, not only a record count.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 07:45:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34652#M25385</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-11-22T07:45:49Z</dc:date>
    </item>
    <item>
      <title>Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34653#M25386</link>
      <description>&lt;P&gt;Thank you for your reply..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We stored our processed data to delta format.&lt;/P&gt;&lt;P&gt;Now from testing point of view, I am reading all the parquet files to dataframe to apply the queries.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here, we tried to see how many records data we can display or show in databricks, so we used the below command as normal display is giving first 232 rows only&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;df_delta.show(df_delta.count(),False) -- we are trying to show/read 7 lakh records(df_delta.count())  and making truncate is false.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thankyou&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 07:55:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34653#M25386</guid>
      <dc:creator>SailajaB</dc:creator>
      <dc:date>2021-11-22T07:55:10Z</dc:date>
    </item>
    <item>
      <title>Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34654#M25387</link>
      <description>&lt;P&gt;OK I see the problem.  The issue is not databricks not being able to show these records.&lt;/P&gt;&lt;P&gt;The show command will run on the driver and for a lot of data this will give errors.&lt;/P&gt;&lt;P&gt;But there is a huge difference between showing data on screen and processing/writing them.&lt;/P&gt;&lt;P&gt;There is a reason there is a limit on the amount of records shown, as this is pretty expensive (cannot run in parallel too).&lt;/P&gt;&lt;P&gt;the display() command will default to 1000 records ,which can be overridden to 100K (or even a million, can't recall).&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 08:00:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34654#M25387</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-11-22T08:00:11Z</dc:date>
    </item>
    <item>
      <title>Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34655#M25388</link>
      <description>&lt;P&gt;thank you !!!&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 08:48:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34655#M25388</guid>
      <dc:creator>AzureDatabricks</dc:creator>
      <dc:date>2021-11-22T08:48:37Z</dc:date>
    </item>
    <item>
      <title>Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34656#M25389</link>
      <description>&lt;P&gt;can you please let us know limit of data that can be store in &amp;nbsp;Delta table/Hive table or in Parquet file&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 08:49:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34656#M25389</guid>
      <dc:creator>AzureDatabricks</dc:creator>
      <dc:date>2021-11-22T08:49:31Z</dc:date>
    </item>
    <item>
      <title>Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34657#M25390</link>
      <description>&lt;P&gt;As parquet/delta lake is designed for big data: a lot!  Think billions of records.&lt;/P&gt;&lt;P&gt;I don't think there is a hard limit, only in the limits set by the cloud provider (cpu quota etc)&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 09:12:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34657#M25390</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-11-22T09:12:25Z</dc:date>
    </item>
    <item>
      <title>Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34658#M25391</link>
      <description>&lt;P&gt;Hi @sujata birajdar​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Did @Werner Stinckens​&amp;nbsp; fully answered your question, would you be happy to mark their answer as best so that others can quickly find the solution?&lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 20:16:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34658#M25391</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-11-22T20:16:12Z</dc:date>
    </item>
    <item>
      <title>Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded</title>
      <link>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34659#M25392</link>
      <description>&lt;P&gt;thank you !!!&lt;/P&gt;</description>
      <pubDate>Tue, 23 Nov 2021 03:47:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/need-to-see-all-the-records-in-deltatable-exception-java-lang/m-p/34659#M25392</guid>
      <dc:creator>AzureDatabricks</dc:creator>
      <dc:date>2021-11-23T03:47:01Z</dc:date>
    </item>
  </channel>
</rss>

