Zombie .Net Spark Databricks Job (CourseGrainedExecutorBackend)

timothy_uk
New Contributor III

Hi all,

Environment:

Nodes: Standard_E8s_v3

Databricks Runtime: 9.0

.NET for Apache Spark 2.0.0

I'm invoking spark submit to run a .Net Spark job hosted in Azure Databricks. The job is written in C#.Net with its only transformation and action, reading a CSV then displaying its records. The job has been running swimingly for months until recently where I've noticed it will not self-terminate after completion. The job perform the work then remain active indefinitely until I manually terminate it.

This is the app's code:

SparkSession spark = SparkSession
        .Builder()
        .AppName("My App Name")
        .GetOrCreate();
 
string sourcePath = args[0];
 
DataFrame df = spark
        .Read()
        .Option("header", "true")
        .Option("quote", "\"")
        .Csv(sourcePath);
 
df.Show();
 
spark.Stop();

I've attached a dump of the driver's Log4j output.

Edit 16/12/2021:

Problem is possibly related to the workers refusing to shutdown after completing their work as indicated by the workers' stderr ouput final entry...

21/12/16 00:23:10 INFO DBFS: Initialized DBFS with DBFSV2 as the delegate.

21/12/16 00:23:10 INFO Utils: resolved command to be run: WrappedArray(getconf, PAGESIZE)

21/12/16 00:23:10 INFO FileScanRDD: Reading File path: dbfs:/mnt/opstats/raw/snakes/intercom.csv, range: 0-695442, partition values: [empty row], modificationTime: 1639438344000.

21/12/16 00:23:10 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1703 bytes result sent to driver

21/12/16 00:23:16 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown

21/12/16 00:23:16 INFO MemoryStore: MemoryStore cleared

21/12/16 00:23:16 INFO BlockManager: BlockManager stopped

21/12/16 00:23:16 ERROR CoarseGrainedExecutorBackend: RECE

Is anybody able to shed some light on this mysterious issue?

Thanks

Tim.