cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

[pyspark] foreach + print produces no output

JulioManuelNava
New Contributor

The following code produces no output. It seems as if the print(x) is not being executed for each "words" element:

words = sc.parallelize (
   ["scala", 
   "java", 
   "hadoop", 
   "spark", 
   "akka",
   "spark vs hadoop", 
   "pyspark",
   "pyspark and spark"]
)
def f(x): print(x)
fore = words.foreach(f) 

Any idea?

Thanks in advance

2 REPLIES 2

DiegoAlves
New Contributor II

The

RDD.foreach
method in Spark runs on the cluster so each worker which contains these records is running the operations in
foreach
. I.e. your code is running, but they are printing out on the Spark workers stdout, not in the driver/your shell session.

There is an easy alternative to print out the desired output:

for w in words.toLocalIterator():
    print(w)

john_nicholas
New Contributor II

Epson wf-3640 error code 0x97 is the common printer error code that may occur mostly in all printers but in order to resolve the error code, upon provides the best printer guide to all printer users.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.