cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark + Python - Java gateway process exited before sending the driver its port number?

lau_thiamkok
New Contributor II

Why do I get this error on my browser screen,

<type 'exceptions.Exception'>: Java gateway process exited before sending the driver its port number args = ('Java gateway process exited before sending the driver its port number',) message = 'Java gateway process exited before sending the driver its port number'

For,

#!/Python27/python
print "Content-type: text/html; charset=utf-8"
print
# enable debugging
import cgitb
cgitb.enable()
import os
import sys
# Path for spark source folder
os.environ['SPARK_HOME'] = "C:\Apache\spark-1.4.1"
# Append pyspark to Python Path
sys.path.append("C:\Apache\spark-1.4.1\python")
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
print ("Successfully imported Spark Modules")
# Initialize SparkContext
sc = SparkContext('local')
words = sc.parallelize(["scala","java","hadoop","spark","akka"])
print words.count()
sc.stop()

I'm on Windows7. My Spark version is spark-1.4.1-bin-hadoop2.6.tgz - prebuild for haddop 2.6 and later.

Any ideas how I can fix it?

5 REPLIES 5

jmdvinodjmd
New Contributor II

I am facing the same problem....:( Have anybody found any solution?

lau_thiamkok
New Contributor II

I got it fix by running the script on apache mod wsgi. DO NOT run it on cgi !!

lau_thiamkok
New Contributor II

This is a working example running on wsgi,

import os import sys

Path for spark source folder

s.environ['SPARK_HOME'] = "C:\Apache\spark-1.4.1"

Append pyspark to Python Path

ys.path.append("C:\Apache\spark-1.4.1\python")

from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import SQLContext

This is our application object. It could have any name,

except when using mod_wsgi where it must be "application" def application(environ, start_response):

 # Initialize SparkContext
 sc = SparkContext('local')
 words = sc.parallelize(["scala","java","hadoop","spark","akka"])
 count = words.count()
 #print count

 sc.stop()

 response_body = "Successfully imported Spark Modules and the total words are: " + str(count)

 # HTTP response code and message
 status = '200 OK'

 # These are HTTP headers expected by the client.
 # They must be wrapped as a list of tupled pairs:
 # [(Header name, Header value)].
 response_headers = [('Content-Type', 'text/plain'),
                    ('Content-Length', str(len(response_body)))]

 # Send them to the server using the supplied function
 start_response(status, response_headers)

 # Return the response body.
 # Notice it is wrapped in a list although it could be any iterable.
 return [response_body]</pre>

Sunil
New Contributor II

HI we are usinh anaconda +CDH , pyspark works well but using Ipython gives

Java gateway process exited before sending the driver its port number

EricaLi
New Contributor II

I'm facing the same problem, does anybody know how to connect Spark in Ipython notebook?

The issue I created,

https://github.com/jupyter/notebook/issues/743

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.