<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Optimizing Task Execution Time on Databricks Serverless Compute in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/optimizing-task-execution-time-on-databricks-serverless-compute/m-p/110101#M43489</link>
    <description>&lt;H3&gt;Question:&lt;/H3&gt;&lt;P&gt;To reduce cluster- start up times, trying out the serveless compute option while triggering workflows, for proof of concept.&amp;nbsp;I've noticed that a simple pyspark DataFrame creation task completes in 40-50 seconds. However, when multiple requests are queued for the same task on the serverless compute, the execution time for the 2nd and 3rd requests increases to 1.5 to 3 minutes.&lt;/P&gt;&lt;P&gt;According to the query history tab, each task only takes 3-5 seconds to complete, indicating significant time spent on scheduling and resource allocation. How can I reduce this overhead to achieve a total processing time of under 10 seconds per request?&lt;/P&gt;&lt;P&gt;Please note that, do not want concurrent runs for this use case. Pretty much depend on the queue for FIFO execution linearly.&lt;/P&gt;</description>
    <pubDate>Thu, 13 Feb 2025 10:55:12 GMT</pubDate>
    <dc:creator>dmadh</dc:creator>
    <dc:date>2025-02-13T10:55:12Z</dc:date>
    <item>
      <title>Optimizing Task Execution Time on Databricks Serverless Compute</title>
      <link>https://community.databricks.com/t5/data-engineering/optimizing-task-execution-time-on-databricks-serverless-compute/m-p/110101#M43489</link>
      <description>&lt;H3&gt;Question:&lt;/H3&gt;&lt;P&gt;To reduce cluster- start up times, trying out the serveless compute option while triggering workflows, for proof of concept.&amp;nbsp;I've noticed that a simple pyspark DataFrame creation task completes in 40-50 seconds. However, when multiple requests are queued for the same task on the serverless compute, the execution time for the 2nd and 3rd requests increases to 1.5 to 3 minutes.&lt;/P&gt;&lt;P&gt;According to the query history tab, each task only takes 3-5 seconds to complete, indicating significant time spent on scheduling and resource allocation. How can I reduce this overhead to achieve a total processing time of under 10 seconds per request?&lt;/P&gt;&lt;P&gt;Please note that, do not want concurrent runs for this use case. Pretty much depend on the queue for FIFO execution linearly.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Feb 2025 10:55:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimizing-task-execution-time-on-databricks-serverless-compute/m-p/110101#M43489</guid>
      <dc:creator>dmadh</dc:creator>
      <dc:date>2025-02-13T10:55:12Z</dc:date>
    </item>
    <item>
      <title>Re: Optimizing Task Execution Time on Databricks Serverless Compute</title>
      <link>https://community.databricks.com/t5/data-engineering/optimizing-task-execution-time-on-databricks-serverless-compute/m-p/110113#M43491</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/146467"&gt;@dmadh&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;At the moment there isn't a direct way to improve this. Our engineering team is working on "speed optimized" feature and "warm pool" but isn't available yet.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Feb 2025 12:47:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/optimizing-task-execution-time-on-databricks-serverless-compute/m-p/110113#M43491</guid>
      <dc:creator>Alberto_Umana</dc:creator>
      <dc:date>2025-02-13T12:47:38Z</dc:date>
    </item>
  </channel>
</rss>

