Databricks Community

raghvendrarm1 · ‎10-17-2025

I tried to read many articles but still not clear on this:

The executors complete the execution of tasks and have the results with them.

1. The results(output data) from all executors is transported to driver in all cases or executors persist it if that is to be done to a file storage etc.

2. If in call cases results are transported back to driver- how is that achieved- do we have any link to a document describing that in detail.

dkushari · ‎10-18-2025

Hi @raghvendrarm1 - Have a look at the section "Apache Spark’s Distributed Execution" in chapter 1 of Learning Spark, 2nd Edition (https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/ch01.html). Have a look at teh picture -

View solution in original post

K_Anudeep · ‎10-19-2025

Hello @raghvendrarm1 ,

Below are the answers to your questions:

Do executors always send “results” to the driver?

No. Only actions that return values (e.g., collect, take, first, count) bring data back to the driver. collect explicitly “returns all the records … as a list” in the driver’s memory and is bounded by spark.driver.maxResultSize. In contrast, writes (df.write…, INSERT, SAVE) are performed by executors to storage, with the driver just coordinating commits. Shuffles move data between executors, not through the driver.

How is it done under the hood (at a high level)?

Return-to-driver actions: Each task serialises its partition’s result; the driver gathers them (subject to spark.driver.maxResultSize). That’s why a large collect() can OOM the driver.
Spark’s Data Source API V2 has executors write partition outputs and send small commit messages back; the driver finalizes the job commit.

Anudeep

View solution in original post

dkushari · ‎10-18-2025

Hi @raghvendrarm1 - Have a look at the section "Apache Spark’s Distributed Execution" in chapter 1 of Learning Spark, 2nd Edition (https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/ch01.html). Have a look at teh picture -

K_Anudeep · ‎10-19-2025

Hello @raghvendrarm1 ,

Below are the answers to your questions:

Do executors always send “results” to the driver?

No. Only actions that return values (e.g., collect, take, first, count) bring data back to the driver. collect explicitly “returns all the records … as a list” in the driver’s memory and is bounded by spark.driver.maxResultSize. In contrast, writes (df.write…, INSERT, SAVE) are performed by executors to storage, with the driver just coordinating commits. Shuffles move data between executors, not through the driver.

How is it done under the hood (at a high level)?

Return-to-driver actions: Each task serialises its partition’s result; the driver gathers them (subject to spark.driver.maxResultSize). That’s why a large collect() can OOM the driver.
Spark’s Data Source API V2 has executors write partition outputs and send small commit messages back; the driver finalizes the job commit.

Anudeep

Databricks Community

Results from the spark application to driver

Join Us as a Local Community Builder!

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐