<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Debugging difference between &amp;quot;task time&amp;quot; and execution time for SQL query in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/96122#M39219</link>
    <description>&lt;P&gt;I have a pretty complex and large SQL query which does a lot of joins on CTEs. Due to the nature of the data this has to be done using cross joins so I suspect that this might be the reason it is slow. I was hoping to be able to pinpoint where the tasks are waiting for available nodes or where the query is taking so much time (wall clock duration). I tried using the query profiler but this seems to show the execution time of the tasks and not the whole process.&lt;/P&gt;</description>
    <pubDate>Fri, 25 Oct 2024 11:01:51 GMT</pubDate>
    <dc:creator>nengen</dc:creator>
    <dc:date>2024-10-25T11:01:51Z</dc:date>
    <item>
      <title>Debugging difference between "task time" and execution time for SQL query</title>
      <link>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/95769#M39168</link>
      <description>&lt;P&gt;I have a pretty large SQL query that has the following stats from the query profiler:&lt;/P&gt;&lt;P&gt;Tasks total time: 1.93s&lt;/P&gt;&lt;P&gt;Executing: 27s&lt;/P&gt;&lt;P&gt;Based on the information in the query profiler this can be due to tasks waiting for available nodes.&lt;/P&gt;&lt;P&gt;How should I approach this to figure out where this is happening?&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2024 18:35:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/95769#M39168</guid>
      <dc:creator>nengen</dc:creator>
      <dc:date>2024-10-23T18:35:34Z</dc:date>
    </item>
    <item>
      <title>Re: Debugging difference between "task time" and execution time for SQL query</title>
      <link>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/95770#M39169</link>
      <description>&lt;P&gt;Hi nengen&lt;BR /&gt;&lt;BR /&gt;You may have more infos to share, so we can help you?&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2024 19:37:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/95770#M39169</guid>
      <dc:creator>Stefan-Koch</dc:creator>
      <dc:date>2024-10-23T19:37:48Z</dc:date>
    </item>
    <item>
      <title>Re: Debugging difference between "task time" and execution time for SQL query</title>
      <link>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/96122#M39219</link>
      <description>&lt;P&gt;I have a pretty complex and large SQL query which does a lot of joins on CTEs. Due to the nature of the data this has to be done using cross joins so I suspect that this might be the reason it is slow. I was hoping to be able to pinpoint where the tasks are waiting for available nodes or where the query is taking so much time (wall clock duration). I tried using the query profiler but this seems to show the execution time of the tasks and not the whole process.&lt;/P&gt;</description>
      <pubDate>Fri, 25 Oct 2024 11:01:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/96122#M39219</guid>
      <dc:creator>nengen</dc:creator>
      <dc:date>2024-10-25T11:01:51Z</dc:date>
    </item>
    <item>
      <title>Re: Debugging difference between "task time" and execution time for SQL query</title>
      <link>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/96139#M39222</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/128929"&gt;@nengen&lt;/a&gt;&amp;nbsp; Try using &lt;STRONG&gt;EXPLAIN EXTENDED&lt;/STRONG&gt;: This provides a detailed breakdown of the logical and physical plan of a query in Spark SQL.&lt;/P&gt;&lt;P&gt;Based on the &lt;STRONG&gt;EXPLAIN EXTENDED&lt;/STRONG&gt; output, here are a few things to consider:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Broadcast Exchange:&lt;/STRONG&gt; If the join causes data skew, consider switching to a sort-merge join.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;FileScan:&lt;/STRONG&gt; If the scan is slow, consider partitioning or caching the data to improve performance.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Filter Pushdown:&lt;/STRONG&gt; Ensure the most restrictive filters are applied early to reduce the amount of data processed.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-syntax-qry-explain" target="_self"&gt;Please review for more details&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Oct 2024 13:34:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/debugging-difference-between-quot-task-time-quot-and-execution/m-p/96139#M39222</guid>
      <dc:creator>Panda</dc:creator>
      <dc:date>2024-10-25T13:34:56Z</dc:date>
    </item>
  </channel>
</rss>

