cancel
Showing results for 
Search instead for 
Did you mean: 
Knowledge Sharing Hub
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why we are using shared variables into spark?

Yogic24
New Contributor II

Hello Databricks Community🙂!

I am excited to share my first blog 🚀 post with you all. This is a small and basic introduction to the concept of shared variables in Apache Spark. I hope this post will help those who are new to Spark understand why shared variables are important and how to use them effectively.

When you pass a function like filter() to Spark, it's executed on the worker nodes in the cluster. This function can indeed access variables defined outside of it, but the changes made to those variables are not reflected back to the driver program automatically. This is because each task running on a worker node operates on its own copy of the variables, and these copies are not automatically synchronized with the variables in the driver program.

Accumulators and Broadcast variable are used to remove above drawback ( i.e. we can get the updated values back to our Driver program)



🤝 Let's connect, engage, and grow together! I'm eager to hear your thoughts, experiences, and perspectives.I look forward to your feedback and engaging in discussions with the community.

Thank you for your support!

 

0 REPLIES 0
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!