Thanks for your reply.
Below is my answer to all the queries.
1. What error is the job returning after the 2 hours execution?
-> Below is the screenshot of error

2. What is the data source we are talking about? Are you trying to load data from a database, another warehouse, a bunch of csv/json/excel files?
-> Currently we are using pyspark and python code to create the data. So source is dataframe
3. How are you chunking the data? (if you could give a code example...)
-> df dataframe contains data.
total_rows = 1269408570800
batch_size = 1000000
num_batches = (total_rows // batch_size) + (1 if total_rows % batch_size != 0 else 0)
print(num_batches)
for i in range(num_batches):
batch_df = df.filter(f"spark_partition_id() == {i}")
batch_df.write.mode("overwrite").saveAsTable(table_path)
I've tried using sql to upload. Data will be loaded to table if I provide limit as 1,00,000. But If I increase it then this is also failing with same executor error. It will take time to upload if everytime we are only uploading 1,00,000 rows.
INSERT INTO dspf_test.bulksaleshourlydataperformance_1845.t_hourly_sales_summary_result select * from temp_view limit 100000000.
4. Are there any resource bottlenecks visible in the Spark UI (e.g., memory, CPU, disk, network)?
-> I could see that the CPU utilisation for all the 4 executors have reached 80% .
5. Do you see any relevant warnings or errors in the Spark executor/driver logs before the failure?
-> Yes . ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 146027 ms
6. The cluster is fix sized with 4 workers or auto-scale is enabled?
-> autoscale is not enabled