cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

[pyspark] foreach + print produces no output

JulioManuelNava
New Contributor

The following code produces no output. It seems as if the print(x) is not being executed for each "words" element:

words = sc.parallelize (
   ["scala", 
   "java", 
   "hadoop", 
   "spark", 
   "akka",
   "spark vs hadoop", 
   "pyspark",
   "pyspark and spark"]
)
def f(x): print(x)
fore = words.foreach(f) 

Any idea?

Thanks in advance

2 REPLIES 2

DiegoAlves
New Contributor II

The

RDD.foreach
method in Spark runs on the cluster so each worker which contains these records is running the operations in
foreach
. I.e. your code is running, but they are printing out on the Spark workers stdout, not in the driver/your shell session.

There is an easy alternative to print out the desired output:

for w in words.toLocalIterator():
    print(w)

john_nicholas
New Contributor II

Epson wf-3640 error code 0x97 is the common printer error code that may occur mostly in all printers but in order to resolve the error code, upon provides the best printer guide to all printer users.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group