<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Is it reasonable for the process &amp;quot;Determining the location of DBIO file fragments.&amp;quot; to take me 7 hours? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26097#M18220</link>
    <description>&lt;P&gt;I only have 1000 columns. Each column has 252 rows, so there are only 252000 data points.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How come it can route tasks for the best-cached locality for 7 hours?&lt;/P&gt;</description>
    <pubDate>Mon, 24 Oct 2022 14:56:23 GMT</pubDate>
    <dc:creator>Dicer</dc:creator>
    <dc:date>2022-10-24T14:56:23Z</dc:date>
    <item>
      <title>Is it reasonable for the process "Determining the location of DBIO file fragments." to take me 7 hours?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26097#M18220</link>
      <description>&lt;P&gt;I only have 1000 columns. Each column has 252 rows, so there are only 252000 data points.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How come it can route tasks for the best-cached locality for 7 hours?&lt;/P&gt;</description>
      <pubDate>Mon, 24 Oct 2022 14:56:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26097#M18220</guid>
      <dc:creator>Dicer</dc:creator>
      <dc:date>2022-10-24T14:56:23Z</dc:date>
    </item>
    <item>
      <title>Re: Is it reasonable for the process "Determining the location of DBIO file fragments." to take me 7 hours?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26099#M18222</link>
      <description>&lt;P&gt;I tried the following one and it still took more than 10 hours until &lt;B&gt;Fatal error: The Python kernel is unresponsive.&lt;/B&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;%sql
&amp;nbsp;
--Enable Auto Optimization
set spark.databricks.delta.properties.defaults.autoOptimize.optimizeWrite = true;
set spark.databricks.delta.properties.defaults.autoOptimize.autoCompact = true;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Oct 2022 13:50:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26099#M18222</guid>
      <dc:creator>Dicer</dc:creator>
      <dc:date>2022-10-25T13:50:11Z</dc:date>
    </item>
    <item>
      <title>Re: Is it reasonable for the process "Determining the location of DBIO file fragments." to take me 7 hours?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26100#M18223</link>
      <description>&lt;P&gt;@Kaniz Fatma​&amp;nbsp;Is there any way to shorten the process &lt;B&gt;"Determining the location of DBIO file fragments." &lt;/B&gt;runtime?&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2022 18:52:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26100#M18223</guid>
      <dc:creator>Dicer</dc:creator>
      <dc:date>2022-10-31T18:52:20Z</dc:date>
    </item>
    <item>
      <title>Re: Is it reasonable for the process "Determining the location of DBIO file fragments." to take me 7 hours?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26101#M18224</link>
      <description>&lt;P&gt;Hi @Cheuk Hin Christophe Poon​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or &lt;B&gt;mark an answer as best&lt;/B&gt;? Else please let us know if you need more help.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 27 Nov 2022 14:17:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26101#M18224</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-27T14:17:09Z</dc:date>
    </item>
    <item>
      <title>Re: Is it reasonable for the process "Determining the location of DBIO file fragments." to take me 7 hours?</title>
      <link>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26102#M18225</link>
      <description>&lt;P&gt;Hi @Cheuk Hin Christophe Poon​&amp;nbsp;have you optimize your table anytime since it's creation? If not, then optimize may take some time depending on the no of underlying files.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please try to run optimize manually as described in below document:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/sql/language-manual/delta-optimize.html" target="test_blank"&gt;https://docs.databricks.com/sql/language-manual/delta-optimize.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If this doesn't help, you can try disabling DBIO cache by setting below in your notebook:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark.conf.set("spark.databricks.io.cache.enabled", "false")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Nov 2022 15:01:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/is-it-reasonable-for-the-process-quot-determining-the-location/m-p/26102#M18225</guid>
      <dc:creator>Noopur_Nigam</dc:creator>
      <dc:date>2022-11-30T15:01:42Z</dc:date>
    </item>
  </channel>
</rss>

