cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Zombie .Net Spark Databricks Job (CourseGrainedExecutorBackend)

timothy_uk
New Contributor III

Hi all,

Environment:

Nodes: Standard_E8s_v3

Databricks Runtime: 9.0

.NET for Apache Spark 2.0.0

I'm invoking spark submit to run a .Net Spark job hosted in Azure Databricks. The job is written in C#.Net with its only transformation and action, reading a CSV then displaying its records. The job has been running swimingly for months until recently where I've noticed it will not self-terminate after completion. The job perform the work then remain active indefinitely until I manually terminate it.

This is the app's code:

SparkSession spark = SparkSession
        .Builder()
        .AppName("My App Name")
        .GetOrCreate();
 
string sourcePath = args[0];
 
DataFrame df = spark
        .Read()
        .Option("header", "true")
        .Option("quote", "\"")
        .Csv(sourcePath);
 
df.Show();
 
spark.Stop();

I've attached a dump of the driver's Log4j output.

Edit 16/12/2021:

Problem is possibly related to the workers refusing to shutdown after completing their work as indicated by the workers' stderr ouput final entry...

21/12/16 00:23:10 INFO DBFS: Initialized DBFS with DBFSV2 as the delegate.

21/12/16 00:23:10 INFO Utils: resolved command to be run: WrappedArray(getconf, PAGESIZE)

21/12/16 00:23:10 INFO FileScanRDD: Reading File path: dbfs:/mnt/opstats/raw/snakes/intercom.csv, range: 0-695442, partition values: [empty row], modificationTime: 1639438344000.

21/12/16 00:23:10 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1703 bytes result sent to driver

21/12/16 00:23:16 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown

21/12/16 00:23:16 INFO MemoryStore: MemoryStore cleared

21/12/16 00:23:16 INFO BlockManager: BlockManager stopped

21/12/16 00:23:16 ERROR CoarseGrainedExecutorBackend: RECE

Is anybody able to shed some light on this mysterious issue?

Thanks

Tim.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @timothy_uk! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

View solution in original post

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @timothy_uk! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.

jose_gonzalez
Moderator
Moderator

Hi @Timothy Lin​ ,

I will recommend to not use spark.stop() or System.exit(0) in your code because it will explicitly stop the Spark context but the graceful shutdown and handshake with databricks' job service does not happen.

Thank you @Jose Gonzalez​  than shall I try removing those operations?

Yes, you need to remove it.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.