cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Cluster configuration and optimal number for fs.s3a.connection.maximum , fs.s3a.threads.max

Vee
New Contributor

Please could you suggest best cluster configuration for a use case stated below and tips to resolve the errors shown below -

Use case:

There could be 4 or 5 spark jobs that run concurrently.

Each job reads 40 input files and spits out 120 output files to s3 in csv firmat( three times of input file)

All concurrent jobs read the same 39 input files and just one file that will have the variation for a job

Often the jobs fail with the following errors:

Job aborted due to stage failure: Task 0 in stage 3084.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3084.0 (TID...., ip..., executor 0): org.apache.spark.SparkExecution: Task failed while writing rows

Job aborted due to stage failure: Task 0 in stage 3078.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3078.0 (TID...., ip..., executor 0): java.io.interruptedExecution: getFileStatus on s3:<file path> : com.amazonaws.SdkClientException: Unable to execute HTTP request. Timeout waiting for connection from pool

Given below is my spark_conf

new SparkConf()

.set("spark.serializer", classOf[KryoSerializer].getName)

.set("spark.hadoop.fs.s3z.impl", "org.apache.hadoop.fs.s3a.s3AFileSystem")

.set("spark.hadoop.fs.s3a.connection.maximum", 400)

.set("fs.s3a.threads.max",200)

.set("spark.hadoop.fs.s3a.fast.upload",true)

Spark UI , Environment section shows

spark.hadoop.fs.s3a.connection.maximum = 200

fs.s3a.threads.max = 136

and does not align with my setting

Questions:

(1) What needs to be done for caching input files that are read for subsequent concurrent jobs to use? Would Storage optimized , Delta cache cluster config do this

(2) Why are'nt the numbers in SparkUI Environment match with my Spark conf setting

(3) How to resolve these job errors

Thanks,

Vee

2 REPLIES 2

jose_gonzalez
Moderator
Moderator

Hi @Vetrivel Senthil​ ,

Just wondering if this question is a duplicate from this one https://community.databricks.com/s/feed/0D53f00001qvQJcCAM?

Kaniz
Community Manager
Community Manager

Hi @Vetrivel Senthil​ , Just a friendly follow-up. Do you still need help? Please let us know.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.