cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Mysterious simultaneous long-running Databricks Workflows

timothy_uk
New Contributor III

Hi,

This happened across 4x seemingly unrelated workflows at the same time of the day - all 4x workflows eventually completed successfully. It appeared that all workflows sat idling despite triggering via the Jobs API. The two symptoms I have observed are for 3x workflows, Databricks intiating its cluster creation only 3hrs(!) after the request was issued via Jobs API and the last workflow, where a cluster was promptly created but idling on a task/notebook for 3hrs despite none of its individual cells reporting any duration longer than a few seconds.

In a nutshell, the workflows weren't delayed in processing, reading, writing or doing any other work. They looked like they all suddenly sat idle for 3hrs at the same time.

Driver logs & Log4js don't reveal anything. We are on the Azure cloud using Spot instances so I was wondering if it could have anything to do with eviction, however nothing in the logs suggests this was happening. Could it be the Azure cloud slow in providing compute?

Before I get dive deeper and get lost in the rabbit hole I wanted to poll the community first.

Thanks

Tim.

1 REPLY 1

Anonymous
Not applicable

Hi @Timothy Lin​ 

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now