<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Write 160M rows with 300 columns into Delta Table using Databricks? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/write-160m-rows-with-300-columns-into-delta-table-using/m-p/17200#M11234</link>
    <description>&lt;P&gt;Hi @govind@dqlabs.ai​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 02 May 2022 15:41:34 GMT</pubDate>
    <dc:creator>Anonymous</dc:creator>
    <dc:date>2022-05-02T15:41:34Z</dc:date>
    <item>
      <title>Write 160M rows with 300 columns into Delta Table using Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/write-160m-rows-with-300-columns-into-delta-table-using/m-p/17196#M11230</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt; Hi, I am using databricks to load data from one delta table into another delta table.&lt;/P&gt;
&lt;P&gt; I'm using SIMBA Spark JDBC connector to pull data from delta table in my source instance and writing into delta table in my databricks instance. &lt;/P&gt;
&lt;P&gt; The source has ~160M Rows and 300 columns of data. &lt;/P&gt;
&lt;P&gt; While writing into delta table in my databricks instance, I'm getting following error:&lt;/P&gt;
&lt;P&gt; An error occurred while calling o494.save. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 6, 10.82.228.157, executor 8): java.sql.SQLException: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 16 tasks (4.1 GiB) is bigger than spark.driver.maxResultSize 4.0 GiB.&lt;/P&gt;
&lt;P&gt;Also attached the detailed error log here &lt;A href="https://storage/attachments/4393-errorlog.txt" target="_blank"&gt;errorlog.txt&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt; &lt;B&gt;Here is my code snippet for writing into delta table:&lt;/B&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;file_location = '/dbfs/perf_test/sample_file' 
options = { "table_name": 'sample_file', "overwriteSchema": True, "mergeSchema": True } 
df.repartition(8).write.format('delta').mode('overwrite').options(**options).save(file_location)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;My databricks instance config is:&lt;/B&gt;&lt;/P&gt;&lt;P&gt;r4.2xlarge, 61 GB Memory, 8 Cores
10 nodes (Scales up to 16nodes)&lt;/P&gt;&lt;P&gt;&lt;B&gt;Here is my spark config:&lt;/B&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 2047m
spark.scheduler.mode FAIR
spark.executor.cores 8
spark.executor.memory 42g
spark.driver.maxResultSize 0 (tried with 0 or 50g)
spark.driver.memory 42g
spark.driver.cores 8
&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Also I tried with setting up spark.driver.maxResultSize value to 0 and 50g which is not helping me.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jul 2021 14:49:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/write-160m-rows-with-300-columns-into-delta-table-using/m-p/17196#M11230</guid>
      <dc:creator>govind</dc:creator>
      <dc:date>2021-07-28T14:49:40Z</dc:date>
    </item>
    <item>
      <title>Re: Write 160M rows with 300 columns into Delta Table using Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/write-160m-rows-with-300-columns-into-delta-table-using/m-p/17199#M11233</link>
      <description>&lt;P&gt;Hi @govind@dqlabs.ai​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Have you tried to remove "repartition(8)"? is there a reason why you only want to have 8 partitions?&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 23:17:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/write-160m-rows-with-300-columns-into-delta-table-using/m-p/17199#M11233</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2022-03-07T23:17:36Z</dc:date>
    </item>
    <item>
      <title>Re: Write 160M rows with 300 columns into Delta Table using Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/write-160m-rows-with-300-columns-into-delta-table-using/m-p/17200#M11234</link>
      <description>&lt;P&gt;Hi @govind@dqlabs.ai​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 02 May 2022 15:41:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/write-160m-rows-with-300-columns-into-delta-table-using/m-p/17200#M11234</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-05-02T15:41:34Z</dc:date>
    </item>
  </channel>
</rss>

