cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark + Python - Java gateway process exited before sending the driver its port number?

lau_thiamkok
New Contributor II

Why do I get this error on my browser screen,

<type 'exceptions.Exception'>: Java gateway process exited before sending the driver its port number args = ('Java gateway process exited before sending the driver its port number',) message = 'Java gateway process exited before sending the driver its port number'

For,

#!/Python27/python
print "Content-type: text/html; charset=utf-8"
print
# enable debugging
import cgitb
cgitb.enable()
import os
import sys
# Path for spark source folder
os.environ['SPARK_HOME'] = "C:\Apache\spark-1.4.1"
# Append pyspark to Python Path
sys.path.append("C:\Apache\spark-1.4.1\python")
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
print ("Successfully imported Spark Modules")
# Initialize SparkContext
sc = SparkContext('local')
words = sc.parallelize(["scala","java","hadoop","spark","akka"])
print words.count()
sc.stop()

I'm on Windows7. My Spark version is spark-1.4.1-bin-hadoop2.6.tgz - prebuild for haddop 2.6 and later.

Any ideas how I can fix it?

5 REPLIES 5

jmdvinodjmd
New Contributor II

I am facing the same problem....:( Have anybody found any solution?

lau_thiamkok
New Contributor II

I got it fix by running the script on apache mod wsgi. DO NOT run it on cgi !!

lau_thiamkok
New Contributor II

This is a working example running on wsgi,

import os import sys

Path for spark source folder

s.environ['SPARK_HOME'] = "C:\Apache\spark-1.4.1"

Append pyspark to Python Path

ys.path.append("C:\Apache\spark-1.4.1\python")

from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import SQLContext

This is our application object. It could have any name,

except when using mod_wsgi where it must be "application" def application(environ, start_response):

 # Initialize SparkContext
 sc = SparkContext('local')
 words = sc.parallelize(["scala","java","hadoop","spark","akka"])
 count = words.count()
 #print count

 sc.stop()

 response_body = "Successfully imported Spark Modules and the total words are: " + str(count)

 # HTTP response code and message
 status = '200 OK'

 # These are HTTP headers expected by the client.
 # They must be wrapped as a list of tupled pairs:
 # [(Header name, Header value)].
 response_headers = [('Content-Type', 'text/plain'),
                    ('Content-Length', str(len(response_body)))]

 # Send them to the server using the supplied function
 start_response(status, response_headers)

 # Return the response body.
 # Notice it is wrapped in a list although it could be any iterable.
 return [response_body]</pre>

Sunil
New Contributor II

HI we are usinh anaconda +CDH , pyspark works well but using Ipython gives

Java gateway process exited before sending the driver its port number

EricaLi
New Contributor II

I'm facing the same problem, does anybody know how to connect Spark in Ipython notebook?

The issue I created,

https://github.com/jupyter/notebook/issues/743

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!