cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Zombie .Net Spark Databricks Job (CourseGrainedExecutorBackend)

timothy_uk
New Contributor III

Hi all,

Environment:

Nodes: Standard_E8s_v3

Databricks Runtime: 9.0

.NET for Apache Spark 2.0.0

I'm invoking spark submit to run a .Net Spark job hosted in Azure Databricks. The job is written in C#.Net with its only transformation and action, reading a CSV then displaying its records. The job has been running swimingly for months until recently where I've noticed it will not self-terminate after completion. The job perform the work then remain active indefinitely until I manually terminate it.

This is the app's code:

SparkSession spark = SparkSession
        .Builder()
        .AppName("My App Name")
        .GetOrCreate();
 
string sourcePath = args[0];
 
DataFrame df = spark
        .Read()
        .Option("header", "true")
        .Option("quote", "\"")
        .Csv(sourcePath);
 
df.Show();
 
spark.Stop();

I've attached a dump of the driver's Log4j output.

Edit 16/12/2021:

Problem is possibly related to the workers refusing to shutdown after completing their work as indicated by the workers' stderr ouput final entry...

21/12/16 00:23:10 INFO DBFS: Initialized DBFS with DBFSV2 as the delegate.

21/12/16 00:23:10 INFO Utils: resolved command to be run: WrappedArray(getconf, PAGESIZE)

21/12/16 00:23:10 INFO FileScanRDD: Reading File path: dbfs:/mnt/opstats/raw/snakes/intercom.csv, range: 0-695442, partition values: [empty row], modificationTime: 1639438344000.

21/12/16 00:23:10 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1703 bytes result sent to driver

21/12/16 00:23:16 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown

21/12/16 00:23:16 INFO MemoryStore: MemoryStore cleared

21/12/16 00:23:16 INFO BlockManager: BlockManager stopped

21/12/16 00:23:16 ERROR CoarseGrainedExecutorBackend: RECE

Is anybody able to shed some light on this mysterious issue?

Thanks

Tim.

3 REPLIES 3

jose_gonzalez
Moderator
Moderator

Hi @Timothy Lin​ ,

I will recommend to not use spark.stop() or System.exit(0) in your code because it will explicitly stop the Spark context but the graceful shutdown and handshake with databricks' job service does not happen.

Thank you @Jose Gonzalez​  than shall I try removing those operations?

Yes, you need to remove it.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group