Databricks notebook taking too long to run as a job compared to when triggered from within the notebook

curious-case-of — Mon, 11 Apr 2022 09:03:18 GMT

I don't know if this question has been covered earlier, but here it goes - I have a notebook that I can run manually using the 'Run' button in the notebook or as a job.

The runtime when I run from within the notebook directly is roughly 2 hours. But when I execute it as a job, the runtime is huge (around 8 hours)

. The piece of code which takes the longest time is calling an applyInPandas function, which in turn calls a pandas_udf which trains an auto_arima model (pmdarima).

Can anyone help me figure out what might be happening? I am clueless.

Thanks!

Re: Databricks notebook taking too long to run as a job compared to when triggered from within the notebook

wvl — Thu, 09 Jun 2022 13:34:08 GMT

We're seeing the same behavior.. Good performance using interactive cluster.

Using identically sized job cluster, performance is bad.

Any ideas?

topic Databricks notebook taking too long to run as a job compared to when triggered from within the notebook in Data Engineering

Databricks notebook taking too long to run as a job compared to when triggered from within the notebook

Re: Databricks notebook taking too long to run as a job compared to when triggered from within the notebook