<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Results from the spark application to driver in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/results-from-the-spark-application-to-driver/m-p/135376#M50333</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/190628"&gt;@raghvendrarm1&lt;/a&gt;&amp;nbsp; ,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below are the answers to your questions:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Do executors always send “results” to the driver?&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No. Only actions that return values (e.g., collect, take, first, count) bring data back to the driver. collect explicitly “returns all the records … as a list” in the driver’s memory and is bounded by spark.driver.maxResultSize. In contrast, writes (df.write…, INSERT, SAVE) are performed by executors to storage, with the driver just coordinating commits. Shuffles move data between executors, not through the driver.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;How is it done under the hood (at a high level)?&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Return-to-driver actions: Each task serialises its partition’s result; the driver gathers them (subject to spark.driver.maxResultSize). That’s why a large collect() can OOM the driver.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&amp;nbsp;Spark’s Data Source API V2 has executors write partition outputs and send small commit messages back; the driver finalizes the job commit.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Sun, 19 Oct 2025 13:54:39 GMT</pubDate>
    <dc:creator>K_Anudeep</dc:creator>
    <dc:date>2025-10-19T13:54:39Z</dc:date>
    <item>
      <title>Results from the spark application to driver</title>
      <link>https://community.databricks.com/t5/data-engineering/results-from-the-spark-application-to-driver/m-p/135265#M50317</link>
      <description>&lt;P&gt;I tried to read many articles but still not clear on this:&lt;/P&gt;&lt;P&gt;The executors complete the execution of tasks and have the results with them.&lt;/P&gt;&lt;P&gt;1. The results(output data) from all executors is transported to driver in all cases or executors persist it if that is to be done to a file storage etc.&lt;/P&gt;&lt;P&gt;2. If in call cases results are transported back to driver- how is that achieved- do we have any link to a document describing that in detail.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Oct 2025 17:24:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/results-from-the-spark-application-to-driver/m-p/135265#M50317</guid>
      <dc:creator>raghvendrarm1</dc:creator>
      <dc:date>2025-10-17T17:24:02Z</dc:date>
    </item>
    <item>
      <title>Re: Results from the spark application to driver</title>
      <link>https://community.databricks.com/t5/data-engineering/results-from-the-spark-application-to-driver/m-p/135349#M50327</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/190628"&gt;@raghvendrarm1&lt;/a&gt;&amp;nbsp;- Have a look at the section "&lt;I&gt;Apache Spark’s Distributed Execution&lt;/I&gt;" in chapter 1 of&amp;nbsp;Learning Spark, 2nd Edition (&lt;SPAN&gt;&lt;A href="https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/ch01.html" target="_blank"&gt;https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/ch01.html&lt;/A&gt;). Have a look at teh picture -&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="dkushari_0-1760818756989.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/20844i419EBF1EFF371B0D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="dkushari_0-1760818756989.png" alt="dkushari_0-1760818756989.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 18 Oct 2025 20:19:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/results-from-the-spark-application-to-driver/m-p/135349#M50327</guid>
      <dc:creator>dkushari</dc:creator>
      <dc:date>2025-10-18T20:19:33Z</dc:date>
    </item>
    <item>
      <title>Re: Results from the spark application to driver</title>
      <link>https://community.databricks.com/t5/data-engineering/results-from-the-spark-application-to-driver/m-p/135376#M50333</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/190628"&gt;@raghvendrarm1&lt;/a&gt;&amp;nbsp; ,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below are the answers to your questions:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Do executors always send “results” to the driver?&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No. Only actions that return values (e.g., collect, take, first, count) bring data back to the driver. collect explicitly “returns all the records … as a list” in the driver’s memory and is bounded by spark.driver.maxResultSize. In contrast, writes (df.write…, INSERT, SAVE) are performed by executors to storage, with the driver just coordinating commits. Shuffles move data between executors, not through the driver.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;How is it done under the hood (at a high level)?&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Return-to-driver actions: Each task serialises its partition’s result; the driver gathers them (subject to spark.driver.maxResultSize). That’s why a large collect() can OOM the driver.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&amp;nbsp;Spark’s Data Source API V2 has executors write partition outputs and send small commit messages back; the driver finalizes the job commit.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Sun, 19 Oct 2025 13:54:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/results-from-the-spark-application-to-driver/m-p/135376#M50333</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-10-19T13:54:39Z</dc:date>
    </item>
  </channel>
</rss>

