cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

is command stuck?

DejanSunderic
New Contributor III

I created some ETL using DataFrames in python. It used to run 180 sec. But it is not taking ~ 1200 sec. I have been changing it, so it could be something that I introduced, or something in the environment.

Part of the process is appending results into a file on S3.

I a looking at Apache Jobs and I cannot see that any of them is active.

While I was writing this, I got: org.apache.spark.SparkException: Job aborted.

Command took 1274.63s -- by xxxxxxxx@gmail.com

at 8/4/2016, 12:44:17 PM on def4 (150 GB)

I have attached output that I got:

command-output.txt

I assume that I should be able to see in Spark UI what is active. I was surprised that Active Tasks on all executors was 0. Should I look at something else?

I tried to restart the cluster, but it was the same before and after. I used the same version of Spark 1.6.2 (Hadoop 2).

11 REPLIES 11

DejanSunderic
New Contributor III

While I waiting for some response (I had lunch and then) I decided to do something else on this notebook, so I cloned it...

I have some initialization code in the notebook. It was taking 60 sec before and after cloning 1.4 sec. Wow!

Did you (Databricks support) do something on the cluster?

I am going to run my etl command.

It was running very fast and then it got "stuck" again. I do not see any Spark job running.

In the meanwhile I got an idea to look into driver log. I've found this:

2016-08-04T19:19:57.980+0000: [GC (Allocation Failure) [PSYoungGen: 6827008K->52511K(7299584K)] 7660819K->886330K(22848000K), 0.0142959 secs] [Times: user=0.08 sys=0.01, real=0.01 secs]

...

04T19:27:03.294+0000: [GC (Allocation Failure) [PSYoungGen: 7270001K->134234K(7454208K)] 8103861K->968093K(23002624K), 0.0509207 secs] [Times: user=0.33 sys=0.00, real=0.05 secs]

the process finally finished after 3600 sec (3x slower then long duration that i was complaining about).

DejanSunderic
New Contributor III

Today at some point I created new cluster again.

Suddenly everything got much faster. It is back to 270 - 330 sec.

My question still stands - how do I know what is server doing/why is it slow/stuck?

btw, how long does it take to moderate question?

Was this issue resolved? I'm also getting the same problem on my spark cluster.

NickStudenski
New Contributor III

I have a similar issue. Several times per week I experience very slow (5 minutes +) of "running command" on a cell that should take sub 1 second to execute. It usually solves the problem to restart the cluster, but still a major inconvenience.

datadro
New Contributor II

Check for GC (garbage collection) errors in standard out for the cluster.

https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

NickStudenski
New Contributor III

I am getting this same issue. Occasionally a cell will display "Running Command" for as long as an hour. This can happen even for simple commands that ordinarily run in less than a second. I have tried restarting the cluster, attaching to a different cluster. Nothing seems to help.

sandeep8530
New Contributor II

Hi,

Facing same issue. Does anyone found the solution?

Risingi
New Contributor II

Mm, probably yes

Carneiro
New Contributor II

I am having a problem very similar.

Since yesterday, without a known reason, some commands that used to run daily are now stuck in a "Running command" state. Commands like:

dataframe.show(n=1)

dataframe.toPandas()

dataframe.description()

dataframe.write.format("csv").save(location)

are now stuck also for quite small dataframes with 28 rows and 5 columns, for example. I would appreciate any help since the problem is also in important daily jobs.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.