I have a code:
from time import sleep
from random import random
from operator import add
def f(a: int) -> float:
sleep(0.1)
return random()
rdd1 = sc.parallelize(range(20), 2)
rdd2 = sc.parallelize(range(20), 2)
rdd3 = sc.parallelize(range(20), 2)
print('result a1:', rdd1.map(f).reduce(add))
print('result a2:', rdd2.map(f).reduce(add))
print('result a3:', rdd3.map(f).reduce(add))
print('result b3:', sum([f(a) for a in range(20)]))
print('result b3:', sum([f(a) for a in range(20)]))
print('result b3:', sum([f(a) for a in range(20)]))
sample result of it:
result a1: 9.80073680418538
result a2: 9.80073680418538
result a3: 9.80073680418538
result b3: 9.219767385799257
result b3: 8.175800896981904
result b3: 9.417623482504323
May anybody explain me why results a* have the same value? In my opinion, all results lines should be different each other.
How to correct the code to be sure results a* are different?
Tested using Runtime 10 and 12.