<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/41224#M27315</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/69645"&gt;@Jfoxyyc&lt;/a&gt;&amp;nbsp;i am having similar problem and cam across the post. Do vnet injection cause this as my workspace is set up like that&lt;/P&gt;</description>
    <pubDate>Wed, 23 Aug 2023 20:35:34 GMT</pubDate>
    <dc:creator>Fadhi</dc:creator>
    <dc:date>2023-08-23T20:35:34Z</dc:date>
    <item>
      <title>dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17139#M11189</link>
      <description>&lt;P&gt;I am trying to save a dataframe after a series of data manipulations using Udf functions to a delta table. I tried using this code&lt;/P&gt;&lt;P&gt;(&lt;/P&gt;&lt;P&gt;&amp;nbsp;df&lt;/P&gt;&lt;P&gt;&amp;nbsp;.write&lt;/P&gt;&lt;P&gt;&amp;nbsp;.format('delta')&lt;/P&gt;&lt;P&gt;&amp;nbsp;.mode('overwrite')&lt;/P&gt;&lt;P&gt;&amp;nbsp;.option('overwriteSchema', 'true')&lt;/P&gt;&lt;P&gt;&amp;nbsp;.saveAsTable('output_table')&lt;/P&gt;&lt;P&gt;)&lt;/P&gt;&lt;P&gt;but this is taking more than 2 hours. So I converted the dataframe into a sql local temp view and tried saving the df as a delta table from that temp view, this worked for one of the notebooks(14 minutes) but for other notebooks this is also taking around 2 hours to write to the delta table. Not sure why this is happening for a very small dataset. Any solution is appreciated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;code:&lt;/P&gt;&lt;P&gt;df.createOrReplaceTempView("sql_temp_view")&lt;/P&gt;&lt;P&gt;%sql&lt;/P&gt;&lt;P&gt;DROP TABLE IF EXISTS default.output_version_2;&lt;/P&gt;&lt;P&gt;create table default.output_version_2&lt;/P&gt;&lt;P&gt;select * from sql_temp_view&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 06:14:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17139#M11189</guid>
      <dc:creator>suresh1122</dc:creator>
      <dc:date>2022-12-13T06:14:09Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17140#M11190</link>
      <description>&lt;P&gt;What is the cluster config you are using ? Also what sort of transformations are being done before your final dataframe is getting created ?&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 06:43:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17140#M11190</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-12-13T06:43:42Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17141#M11191</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="Screenshot (232)"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1008i867B23CE5A1A8AAB/image-size/large?v=v2&amp;amp;px=999" role="button" title="Screenshot (232)" alt="Screenshot (232)" /&gt;&lt;/span&gt;This is the cluster config &amp;amp; transformations like data cleanup using filters and search operations using dictionaries&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 06:59:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17141#M11191</guid>
      <dc:creator>suresh1122</dc:creator>
      <dc:date>2022-12-13T06:59:30Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17142#M11192</link>
      <description>&lt;P&gt;Can you also give the number of partitions the df has ? &lt;/P&gt;&lt;P&gt;you can use df.rdd.getNumPartitions()&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 07:23:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17142#M11192</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-12-13T07:23:44Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17143#M11193</link>
      <description>&lt;P&gt;Hi @Suresh Kakarlapudi​&amp;nbsp;what is your file size??&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 07:31:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17143#M11193</guid>
      <dc:creator>Ajay-Pandey</dc:creator>
      <dc:date>2022-12-13T07:31:25Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17144#M11194</link>
      <description>&lt;P&gt;96 partitions&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 07:45:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17144#M11194</guid>
      <dc:creator>suresh1122</dc:creator>
      <dc:date>2022-12-13T07:45:40Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17145#M11195</link>
      <description>&lt;P&gt;35 MB&lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 09:00:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17145#M11195</guid>
      <dc:creator>suresh1122</dc:creator>
      <dc:date>2022-12-13T09:00:47Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17146#M11196</link>
      <description>&lt;P&gt;Since data is too low, try repartitioning that data before you write using repartition or coalesce. &lt;/P&gt;</description>
      <pubDate>Tue, 13 Dec 2022 10:23:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17146#M11196</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-12-13T10:23:26Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17147#M11197</link>
      <description>&lt;P&gt;I too have similar issue, the no.of partition is 1 at table level and transformation only appyling like date, decimal(20, 2)..etc using withColumn.  5 worker nodes.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1,80,890 records taking 10min time. - how to improve the performance and what are the possible ways to find where it is taking time ?&lt;/P&gt;</description>
      <pubDate>Wed, 04 Jan 2023 09:40:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17147#M11197</guid>
      <dc:creator>Sreekanth_N</dc:creator>
      <dc:date>2023-01-04T09:40:16Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset with 30k rows. It takes around 2hrs. Is there a solution for this problem?</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17148#M11198</link>
      <description>&lt;P&gt;Is your databricks workspace set up as vnet injection by any chance?&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jan 2023 05:15:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/17148#M11198</guid>
      <dc:creator>Jfoxyyc</dc:creator>
      <dc:date>2023-01-05T05:15:30Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/41224#M27315</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/69645"&gt;@Jfoxyyc&lt;/a&gt;&amp;nbsp;i am having similar problem and cam across the post. Do vnet injection cause this as my workspace is set up like that&lt;/P&gt;</description>
      <pubDate>Wed, 23 Aug 2023 20:35:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/41224#M27315</guid>
      <dc:creator>Fadhi</dc:creator>
      <dc:date>2023-08-23T20:35:34Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/41366#M27344</link>
      <description>&lt;P&gt;You should also look into the sql plan if the writing phase is indeed the part that is taking time. Since spark works on lazy evaluation, there might be some other phase that might be taking time&lt;/P&gt;</description>
      <pubDate>Thu, 24 Aug 2023 14:57:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/41366#M27344</guid>
      <dc:creator>Lakshay</dc:creator>
      <dc:date>2023-08-24T14:57:45Z</dc:date>
    </item>
    <item>
      <title>Re: dataframe takes unusually long time to save as a delta table using sql for a very small dataset</title>
      <link>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/89313#M37759</link>
      <description>&lt;P&gt;Same issue I am having read/write takes long time around 10hrs, data size was 21gb&lt;/P&gt;</description>
      <pubDate>Tue, 10 Sep 2024 14:46:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dataframe-takes-unusually-long-time-to-save-as-a-delta-table/m-p/89313#M37759</guid>
      <dc:creator>jaga2</dc:creator>
      <dc:date>2024-09-10T14:46:47Z</dc:date>
    </item>
  </channel>
</rss>

