[pyspark] foreach + print produces no output
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2019 12:40 AM
The following code produces no output. It seems as if the print(x) is not being executed for each "words" element:
words = sc.parallelize (
["scala",
"java",
"hadoop",
"spark",
"akka",
"spark vs hadoop",
"pyspark",
"pyspark and spark"]
)
def f(x): print(x)
fore = words.foreach(f)
Any idea?
Thanks in advance
Labels:
- Labels:
-
Pyspark
2 REPLIES 2
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-04-2019 12:59 PM
The
RDD.foreach
method in Spark runs on the cluster so each worker which contains these records is running the operations in foreach
. I.e. your code is running, but they are printing out on the Spark workers stdout, not in the driver/your shell session.
There is an easy alternative to print out the desired output:
for w in words.toLocalIterator():
print(w)
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-03-2020 03:41 AM
Epson wf-3640 error code 0x97 is the common printer error code that may occur mostly in all printers but in order to resolve the error code, upon provides the best printer guide to all printer users.

