cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks runs cell, but stops output and hangs afterwards.

ThomasKastl
Contributor

tl;dr: A cell that executes purely on the head node stops printed output during execution, but output still shows up in the cluster logs. After execution of the cell, Databricks does not notice the cell is finished and gets stuck. When trying to cancel, Databricks gets stuck as well, and we need to "Clear state".

Long version:

We use the tsfresh library (https://github.com/blue-yonder/tsfresh) in Databricks on a head node (no Spark - just Python). On most runs, the output of the notebook cell simply stops - while the cell is still being executed. This means that in the notebook itself, no new output is shown, even though the cell keeps running in the background. We know this because files generated by this cell are still written, and also, in Cluster -> Driver Logs, output keeps appearing.

This in itself wouldn't really be a problem, however, Databricks doesn't ever realize the cell is finished - meaning the next cell never gets executed. Also, the cell cannot be cancelled the regular way, we need to clear state, meaning losing all computation results that haven't been written out. Simply cancelling gets stuck.

This happened with Runtime 7.3 LTS, we switched to 10.4 LTS now and the problem is still persists. We tried different head node sizes and sometimes it gets stuck sooner, sometimes later, the behavior isn't consistent. We assume it has something to do with how tsfresh handles multitasking, but the problem seems to happen even if we turn off multitasking.

On local versions of Python notebooks, this never happens, leading us to assume it is a problem / bug with Databricks itself.

Any pointers what we can try / how we get in contact with someone from Databricks to check this?

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

it won't use spark unless you call spark functions (a sparkcontext will be created automatically though).

Maybe you can try using the iPython kernel. As from Databricks 11.0 it is the default kernel for python workloads, so I'd try that.

View solution in original post

6 REPLIES 6

-werners-
Esteemed Contributor III

I´d open a support ticket @ Databricks (probably has to go via your cloud provider).

Hubert-Dudek
Esteemed Contributor III

As that library work on pandas problem can be that it doesn't support pandas on spark. On the local version, you probably use non-distributed pandas. You can check behavior by switching between:

import pandas as pd
import pyspark.pandas  as pd

Do you mean that it uses Spark even if I don't tell it to, somehow recognizing it? Because I am not using spark at all, I am using the exact same code as on local, and when I check spark jobs on the Databricks machine there is nothing (as I expect...).

I am using Databricks basically as a "local" machine that I can quickly deploy in the cloud, I am not intending to use any of the Spark / cluster functionality...

-werners-
Esteemed Contributor III

it won't use spark unless you call spark functions (a sparkcontext will be created automatically though).

Maybe you can try using the iPython kernel. As from Databricks 11.0 it is the default kernel for python workloads, so I'd try that.

Thanks! This one actually seems to solve the problem, so I assume the IPython kernel did the trick. Do you know what was used instead, in versions <11.0? Doesn't seem to say so in the docu...

-werners-
Esteemed Contributor III

I suppose it ran on databricks, they probably created a custom kernel with similar properties.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.