<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic What is Accumulators and Broadcast Variables in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/what-is-accumulators-and-broadcast-variables/m-p/76589#M153</link>
    <description>&lt;P&gt;&lt;SPAN&gt;1) &lt;STRONG&gt;Accumulators:&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Accumulators are used to implement counters and sum in Spark applications.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Accumulators allow you to aggregate values from tasks running on worker nodes back to the driver program.&amp;nbsp;They provide a way for tasks to incrementally update a shared variable (the accumulator) in a way that is safe for distributed computation.&amp;nbsp;The driver program can then access the final value of the accumulator after all tasks have completed. (we have single copy on drive machine)&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=conclusion&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;Conclusion&lt;/SPAN&gt;&lt;/A&gt; &lt;SPAN&gt;: Accumulators are an important feature of Apache Spark that allows us to perform distributed calculations on large datasets.&amp;nbsp;They provide a simple and efficient way of accumulating data across multiple tasks in a distributed system.&amp;nbsp;By using accumulators in our Spark applications, we can perform complex calculations on large datasets with ease.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;2) &lt;STRONG&gt;Broadcast :&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;The name suggest are ‘broadacast’ to the nodes of the spark cluster to avoid shuffle operations.&amp;nbsp;It allow you to efficiently distribute read-only data to all worker nodes in the cluster.&amp;nbsp;This data is cached in memory on each worker node, so tasks can access it without having to transfer the data over the network repeatedly.&amp;nbsp;Broadcast variables are particularly useful when you have large datasets or other read-only data that needs to be shared across tasks.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;(we have separte copy on each machine)&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Conclusion : The primary purpose of broadcast variables is to address the challenge of data replication and distribution in distributed systems.&amp;nbsp;Instead of replicating large datasets across multiple nodes, which can be both time-consuming and resource-intensive, broadcast variables enable the efficient transfer of data to all the machines in the cluster.&amp;nbsp;By doing so, broadcast variables eliminate the need for repetitive data transfers and improve the performance of distributed computations.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":handshake:"&gt;🤝&lt;/span&gt; Let's connect, engage, and grow together! I'm eager to hear your thoughts, experiences, and perspectives. Feel free to comment, share, and let's make this journey enriching for everyone.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":light_bulb:"&gt;💡&lt;/span&gt; Stay tuned for regular updates, and let's make our Community feed a place for inspiration and knowledge exchange!&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=knowledgesharing&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;KnowledgeSharing&lt;/SPAN&gt;&lt;/A&gt; &lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=learninganddevelopment&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;LearningAndDevelopment&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=personalgrowth&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;PersonalGrowth&lt;/SPAN&gt;&lt;/A&gt; &lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=skillbuilding&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;SkillBuilding&lt;/SPAN&gt;&lt;/A&gt; &lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=continouslearning&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;ContinousLearning&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=dataengineer&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;dataengineer&lt;/SPAN&gt;&lt;/A&gt; &lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=sparklearning&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;sparklearning&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 03 Jul 2024 07:49:08 GMT</pubDate>
    <dc:creator>Yogic24</dc:creator>
    <dc:date>2024-07-03T07:49:08Z</dc:date>
    <item>
      <title>What is Accumulators and Broadcast Variables</title>
      <link>https://community.databricks.com/t5/community-articles/what-is-accumulators-and-broadcast-variables/m-p/76589#M153</link>
      <description>&lt;P&gt;&lt;SPAN&gt;1) &lt;STRONG&gt;Accumulators:&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Accumulators are used to implement counters and sum in Spark applications.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Accumulators allow you to aggregate values from tasks running on worker nodes back to the driver program.&amp;nbsp;They provide a way for tasks to incrementally update a shared variable (the accumulator) in a way that is safe for distributed computation.&amp;nbsp;The driver program can then access the final value of the accumulator after all tasks have completed. (we have single copy on drive machine)&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=conclusion&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;Conclusion&lt;/SPAN&gt;&lt;/A&gt; &lt;SPAN&gt;: Accumulators are an important feature of Apache Spark that allows us to perform distributed calculations on large datasets.&amp;nbsp;They provide a simple and efficient way of accumulating data across multiple tasks in a distributed system.&amp;nbsp;By using accumulators in our Spark applications, we can perform complex calculations on large datasets with ease.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;2) &lt;STRONG&gt;Broadcast :&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;The name suggest are ‘broadacast’ to the nodes of the spark cluster to avoid shuffle operations.&amp;nbsp;It allow you to efficiently distribute read-only data to all worker nodes in the cluster.&amp;nbsp;This data is cached in memory on each worker node, so tasks can access it without having to transfer the data over the network repeatedly.&amp;nbsp;Broadcast variables are particularly useful when you have large datasets or other read-only data that needs to be shared across tasks.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;(we have separte copy on each machine)&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Conclusion : The primary purpose of broadcast variables is to address the challenge of data replication and distribution in distributed systems.&amp;nbsp;Instead of replicating large datasets across multiple nodes, which can be both time-consuming and resource-intensive, broadcast variables enable the efficient transfer of data to all the machines in the cluster.&amp;nbsp;By doing so, broadcast variables eliminate the need for repetitive data transfers and improve the performance of distributed computations.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":handshake:"&gt;🤝&lt;/span&gt; Let's connect, engage, and grow together! I'm eager to hear your thoughts, experiences, and perspectives. Feel free to comment, share, and let's make this journey enriching for everyone.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":light_bulb:"&gt;💡&lt;/span&gt; Stay tuned for regular updates, and let's make our Community feed a place for inspiration and knowledge exchange!&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=knowledgesharing&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;KnowledgeSharing&lt;/SPAN&gt;&lt;/A&gt; &lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=learninganddevelopment&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;LearningAndDevelopment&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=personalgrowth&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;PersonalGrowth&lt;/SPAN&gt;&lt;/A&gt; &lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=skillbuilding&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;SkillBuilding&lt;/SPAN&gt;&lt;/A&gt; &lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=continouslearning&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;ContinousLearning&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=dataengineer&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;dataengineer&lt;/SPAN&gt;&lt;/A&gt; &lt;A class="" href="https://www.linkedin.com/feed/hashtag/?keywords=sparklearning&amp;amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7181894531552796673" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;&lt;SPAN&gt;#&lt;/SPAN&gt;sparklearning&lt;/SPAN&gt;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Jul 2024 07:49:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/what-is-accumulators-and-broadcast-variables/m-p/76589#M153</guid>
      <dc:creator>Yogic24</dc:creator>
      <dc:date>2024-07-03T07:49:08Z</dc:date>
    </item>
    <item>
      <title>Re: What is Accumulators and Broadcast Variables</title>
      <link>https://community.databricks.com/t5/community-articles/what-is-accumulators-and-broadcast-variables/m-p/76749#M156</link>
      <description>&lt;P&gt;Thank you for sharing this&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/107220"&gt;@Yogic24&lt;/a&gt;&amp;nbsp;. I am sure it will help other community members.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 11:15:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/what-is-accumulators-and-broadcast-variables/m-p/76749#M156</guid>
      <dc:creator>RishabhTiwari07</dc:creator>
      <dc:date>2024-07-04T11:15:57Z</dc:date>
    </item>
  </channel>
</rss>

