topic Re: Results from the spark application to driver in Data Engineering

Results from the spark application to driver

raghvendrarm1 — Fri, 17 Oct 2025 17:24:02 GMT

I tried to read many articles but still not clear on this:

The executors complete the execution of tasks and have the results with them.

1. The results(output data) from all executors is transported to driver in all cases or executors persist it if that is to be done to a file storage etc.

2. If in call cases results are transported back to driver- how is that achieved- do we have any link to a document describing that in detail.

Re: Results from the spark application to driver

dkushari — Sat, 18 Oct 2025 20:19:33 GMT

Hi @raghvendrarm1 - Have a look at the section "Apache Spark’s Distributed Execution" in chapter 1 of Learning Spark, 2nd Edition (https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/ch01.html). Have a look at teh picture -

Re: Results from the spark application to driver

K_Anudeep — Sun, 19 Oct 2025 13:54:39 GMT

Hello @raghvendrarm1 ,

Below are the answers to your questions:

Do executors always send “results” to the driver?

No. Only actions that return values (e.g., collect, take, first, count) bring data back to the driver. collect explicitly “returns all the records … as a list” in the driver’s memory and is bounded by spark.driver.maxResultSize. In contrast, writes (df.write…, INSERT, SAVE) are performed by executors to storage, with the driver just coordinating commits. Shuffles move data between executors, not through the driver.

How is it done under the hood (at a high level)?

Return-to-driver actions: Each task serialises its partition’s result; the driver gathers them (subject to spark.driver.maxResultSize). That’s why a large collect() can OOM the driver.
Spark’s Data Source API V2 has executors write partition outputs and send small commit messages back; the driver finalizes the job commit.