cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I run a Pyspark python script in a scala environment

177331
New Contributor II

I need to use both Python Spark code and Scala Spark code in my project. A lot of project configuration is written in Scala part, and I want to generate the data from Scala and pass the data path to my Python script. Then I can use the Python ecosystem to train models etc and generate a dataset as a result. Then Scala will read the result and pass it into our downstream system.

However, when I test the code below, I met some issues. Am I wrong on anything? Is there any better way to achieve my goal?

Cmd 2 running print hello script works well

Cmd 4 running Pyspark python script produce such error

Error: Could not find or load main class org.apache.spark.launcher.Main/databricks/spark/bin/spark-class: line 101: CMD: bad array subscriptTraceback (most recent call last): File "/tmp/cli.py", line 23, in <module> cli.main(sys.argv[1:], standalone_mode=False) File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/tmp/cli.py", line 19, in cli spark = SparkSession.builder.getOrCreate() File "/databricks/spark/python/pyspark/sql/session.py", line 229, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) File "/databricks/spark/python/pyspark/context.py", line 392, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/databricks/spark/python/pyspark/context.py", line 145, in __init__ SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) File "/databricks/spark/python/pyspark/context.py", line 339, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway(conf) File "/databricks/spark/python/pyspark/java_gateway.py", line 108, in launch_gateway raise Exception("Java gateway process exited before sending its port number")Exception: Java gateway process exited before sending its port number1

stdout: java.io.PrintStream@2202fa90

stderr: java.io.PrintStream@4133a68d

import sys.process._

callPythonCli: ()Unit

1 REPLY 1

177331
New Contributor II

The error in the attachment​

image.png

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group