<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databricks notebook taking too long to run as a job compared to when triggered from within the notebook in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-taking-too-long-to-run-as-a-job-compared-to/m-p/23161#M15953</link>
    <description>&lt;P&gt;I don't know if this question has been covered earlier, but here it goes - I have a notebook that I can run manually using the 'Run' button in the notebook or as a job.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The runtime when I run from within the notebook directly is roughly 2 hours. But when I execute it as a job, the runtime is huge (around 8 hours)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;. The piece of code which takes the longest time is calling an applyInPandas function, which in turn calls a pandas_udf which trains an auto_arima model (pmdarima).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can anyone help me figure out what might be happening? I am clueless.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Mon, 11 Apr 2022 09:03:18 GMT</pubDate>
    <dc:creator>curious-case-of</dc:creator>
    <dc:date>2022-04-11T09:03:18Z</dc:date>
    <item>
      <title>Databricks notebook taking too long to run as a job compared to when triggered from within the notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-taking-too-long-to-run-as-a-job-compared-to/m-p/23161#M15953</link>
      <description>&lt;P&gt;I don't know if this question has been covered earlier, but here it goes - I have a notebook that I can run manually using the 'Run' button in the notebook or as a job.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The runtime when I run from within the notebook directly is roughly 2 hours. But when I execute it as a job, the runtime is huge (around 8 hours)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;. The piece of code which takes the longest time is calling an applyInPandas function, which in turn calls a pandas_udf which trains an auto_arima model (pmdarima).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can anyone help me figure out what might be happening? I am clueless.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2022 09:03:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-notebook-taking-too-long-to-run-as-a-job-compared-to/m-p/23161#M15953</guid>
      <dc:creator>curious-case-of</dc:creator>
      <dc:date>2022-04-11T09:03:18Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks notebook taking too long to run as a job compared to when triggered from within the notebook</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-notebook-taking-too-long-to-run-as-a-job-compared-to/m-p/23164#M15956</link>
      <description>&lt;P&gt;We're seeing the same behavior.. Good performance using interactive cluster.&lt;/P&gt;&lt;P&gt;Using identically sized job cluster, performance is bad. &lt;/P&gt;&lt;P&gt;Any ideas?&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2022 13:34:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-notebook-taking-too-long-to-run-as-a-job-compared-to/m-p/23164#M15956</guid>
      <dc:creator>wvl</dc:creator>
      <dc:date>2022-06-09T13:34:08Z</dc:date>
    </item>
  </channel>
</rss>

