<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: query takes too long to write into delta table. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40823#M27248</link>
    <description>&lt;P&gt;I wonder if you have already looked at the sql plan to see which phase is taking more time.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 21 Aug 2023 17:24:04 GMT</pubDate>
    <dc:creator>Lakshay</dc:creator>
    <dc:date>2023-08-21T17:24:04Z</dc:date>
    <item>
      <title>query takes too long to write into delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40181#M27142</link>
      <description>&lt;P&gt;hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;am running into in issue while trying to write the data into a delta table, the query is a join between 3 tables and it takes 5 minutes to fetch the data but 3hours to write the data into the table, the select has 700 records.&amp;nbsp;&lt;/P&gt;&lt;P&gt;here are the approaches i tested:&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;Shared cluster&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;3h&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;Isolated cluster&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;2.88h&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;External table + parquet + compression "ZSTD"&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;2.63h&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;Adjusting table properties : 'delta.targetFileSize' = '256mb',&lt;BR /&gt;'delta.tuneFileSizesForRewrites'= 'true'&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;2.9h&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;buket insert (batches of 100M record each)&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;too long I had to cancel it&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;partitioning&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;not an option&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;cluster&amp;nbsp;Summary&lt;BR /&gt;1-15 Workers:&amp;nbsp;140-2,100 GB Memory&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 20-300 Cores&lt;BR /&gt;1 Driver : 140 GB Memory, 20 Cores&lt;BR /&gt;Runtime: 12.2.x-scala2.12&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2023 08:35:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40181#M27142</guid>
      <dc:creator>Axatar</dc:creator>
      <dc:date>2023-08-17T08:35:28Z</dc:date>
    </item>
    <item>
      <title>Re: query takes too long to write into delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40185#M27144</link>
      <description>&lt;P&gt;thank you for your prompt response, more context to the issue.&amp;nbsp;&lt;/P&gt;&lt;P&gt;the table that am writing data into gets truncated every time i run my script (its used as staging table). which means that am inserting into an empty table every time,&lt;/P&gt;</description>
      <pubDate>Thu, 17 Aug 2023 09:07:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40185#M27144</guid>
      <dc:creator>Axatar</dc:creator>
      <dc:date>2023-08-17T09:07:13Z</dc:date>
    </item>
    <item>
      <title>Re: query takes too long to write into delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40823#M27248</link>
      <description>&lt;P&gt;I wonder if you have already looked at the sql plan to see which phase is taking more time.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Aug 2023 17:24:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40823#M27248</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-08-21T17:24:04Z</dc:date>
    </item>
    <item>
      <title>Re: query takes too long to write into delta table.</title>
      <link>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40910#M27255</link>
      <description>&lt;P&gt;it turned out that the issue was not in the writing side, even when i was getting the results in under 5min, the issue was in the cross join in my query i resolved the issue by doing the same cross joins via dataframes got the results computed and written in 17min&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Aug 2023 09:30:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/query-takes-too-long-to-write-into-delta-table/m-p/40910#M27255</guid>
      <dc:creator>Axatar</dc:creator>
      <dc:date>2023-08-22T09:30:07Z</dc:date>
    </item>
  </channel>
</rss>

