Databricks notebook taking too long to run as a job compared to when triggered from within the notebook
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2022 02:03 AM
I don't know if this question has been covered earlier, but here it goes - I have a notebook that I can run manually using the 'Run' button in the notebook or as a job.
The runtime when I run from within the notebook directly is roughly 2 hours. But when I execute it as a job, the runtime is huge (around 8 hours)
. The piece of code which takes the longest time is calling an applyInPandas function, which in turn calls a pandas_udf which trains an auto_arima model (pmdarima).
Can anyone help me figure out what might be happening? I am clueless.
Thanks!
- Labels:
-
Databricks notebook
-
Notebook
-
Pandas_udf
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2022 06:34 AM
We're seeing the same behavior.. Good performance using interactive cluster.
Using identically sized job cluster, performance is bad.
Any ideas?

