<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Job compute is taking longer even after using pool in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114539#M44864</link>
    <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/145555"&gt;@Isi&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Yeah this helps. Thanks a lot.&lt;/P&gt;</description>
    <pubDate>Fri, 04 Apr 2025 16:03:35 GMT</pubDate>
    <dc:creator>bhargavabasava</dc:creator>
    <dc:date>2025-04-04T16:03:35Z</dc:date>
    <item>
      <title>Job compute is taking longer even after using pool</title>
      <link>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114389#M44800</link>
      <description>&lt;P&gt;Hi team,&lt;/P&gt;&lt;P&gt;We created a workflow and attached it to a job cluster (which is configured to use compute pool). When we run the pipeline, it takes up to 5 minutes to go into clusterReady state and this is adding latency to our use case. Even with subsequent runs, it's waiting for cluster to be ready. Can someone please help me understand how to reduce the overall latency and better way of using job compute.&lt;/P&gt;&lt;P&gt;We tried with serverless warehouse (non SQL) and it's adding around 20-25 seconds latency for each task in the job. In screenshot (Screenshot 2025-04-03 ar 3:43:23 PM), the task took 33 seconds but notebook cell has run only for 16 seconds. Would like to understand what is adding up to latency in this case.&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Bhargava&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2025 10:19:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114389#M44800</guid>
      <dc:creator>bhargavabasava</dc:creator>
      <dc:date>2025-04-03T10:19:35Z</dc:date>
    </item>
    <item>
      <title>Re: Job compute is taking longer even after using pool</title>
      <link>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114411#M44814</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/153649"&gt;@bhargavabasava&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Job Cluster + Compute Pools: Long Startup Times&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;If you’re using Job Clusters backed by compute pools, the initial delay (~5 minutes) is usually due to &lt;SPAN class=""&gt;&lt;STRONG&gt;cluster provisioning&lt;/STRONG&gt;&lt;/SPAN&gt;. While compute pools are designed to reduce cold start times by pre-warming VMs, startup latency can still occur if:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;There are &lt;SPAN class=""&gt;&lt;STRONG&gt;no idle VMs available in the pool&lt;/STRONG&gt;&lt;/SPAN&gt; (e.g., 0 clusters in idle state).&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;The cluster needs to &lt;SPAN class=""&gt;&lt;STRONG&gt;install libraries or run init scripts&lt;/STRONG&gt;&lt;/SPAN&gt;, which adds to the boot time.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;STRONG&gt;Serverless Jobs Latency (~20–25 seconds overhead)&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;The behavior you’re seeing where the notebook logic takes 16 seconds but the task duration is 33 seconds is expected when using &lt;SPAN class=""&gt;&lt;STRONG&gt;Serverless compute for Jobs (non-SQL)&lt;/STRONG&gt;&lt;/SPAN&gt;. There is a small but consistent overhead due to orchestration, environment setup, and logging.&lt;/P&gt;&lt;P class=""&gt;That said, serverless jobs generally start much faster than job clusters and offer more &lt;SPAN class=""&gt;predictable latency&lt;/SPAN&gt;, so a 20–25 second overhead is considered normal.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Suggestions to Reduce Latency&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P class=""&gt;Use &lt;SPAN class=""&gt;&lt;STRONG&gt;instance pools&lt;/STRONG&gt;&lt;/SPAN&gt; with &lt;SPAN class=""&gt;&lt;STRONG&gt;Idle Instance Auto Termination&lt;/STRONG&gt;&lt;/SPAN&gt; set to ~10 minutes. This allows reusing VMs across runs without incurring full provisioning times.&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P class=""&gt;If you’re using isolated job clusters, try to &lt;SPAN class=""&gt;chain multiple tasks&lt;/SPAN&gt; in a single job using dependencies. This way, only the &lt;SPAN class=""&gt;&lt;STRONG&gt;first task pays the cold-start penalty&lt;/STRONG&gt;&lt;/SPAN&gt;, and the following tasks run on the same cluster.&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P class=""&gt;&lt;BR /&gt;Hope this helps &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Isi&lt;/P&gt;</description>
      <pubDate>Thu, 03 Apr 2025 14:22:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114411#M44814</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-04-03T14:22:37Z</dc:date>
    </item>
    <item>
      <title>Re: Job compute is taking longer even after using pool</title>
      <link>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114539#M44864</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/145555"&gt;@Isi&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Yeah this helps. Thanks a lot.&lt;/P&gt;</description>
      <pubDate>Fri, 04 Apr 2025 16:03:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114539#M44864</guid>
      <dc:creator>bhargavabasava</dc:creator>
      <dc:date>2025-04-04T16:03:35Z</dc:date>
    </item>
    <item>
      <title>Re: Job compute is taking longer even after using pool</title>
      <link>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114541#M44866</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/153649"&gt;@bhargavabasava&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Happy to hear that! Consider mark my answer as solution to future users &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Isi&lt;/P&gt;</description>
      <pubDate>Fri, 04 Apr 2025 16:30:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/job-compute-is-taking-longer-even-after-using-pool/m-p/114541#M44866</guid>
      <dc:creator>Isi</dc:creator>
      <dc:date>2025-04-04T16:30:10Z</dc:date>
    </item>
  </channel>
</rss>

