cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How can I run a Pyspark python script in a scala environment

177331
New Contributor II

I need to use both Python Spark code and Scala Spark code in my project. A lot of project configuration is written in Scala part, and I want to generate the data from Scala and pass the data path to my Python script. Then I can use the Python ecosystem to train models etc and generate a dataset as a result. Then Scala will read the result and pass it into our downstream system.

However, when I test the code below, I met some issues. Am I wrong on anything? Is there any better way to achieve my goal?

Cmd 2 running print hello script works well

Cmd 4 running Pyspark python script produce such error

Error: Could not find or load main class org.apache.spark.launcher.Main/databricks/spark/bin/spark-class: line 101: CMD: bad array subscriptTraceback (most recent call last): File "/tmp/cli.py", line 23, in <module> cli.main(sys.argv[1:], standalone_mode=False) File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/databricks/python3/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/tmp/cli.py", line 19, in cli spark = SparkSession.builder.getOrCreate() File "/databricks/spark/python/pyspark/sql/session.py", line 229, in getOrCreate sc = SparkContext.getOrCreate(sparkConf) File "/databricks/spark/python/pyspark/context.py", line 392, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/databricks/spark/python/pyspark/context.py", line 145, in __init__ SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) File "/databricks/spark/python/pyspark/context.py", line 339, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway(conf) File "/databricks/spark/python/pyspark/java_gateway.py", line 108, in launch_gateway raise Exception("Java gateway process exited before sending its port number")Exception: Java gateway process exited before sending its port number1

stdout: java.io.PrintStream@2202fa90

stderr: java.io.PrintStream@4133a68d

import sys.process._

callPythonCli: ()Unit

1 REPLY 1

177331
New Contributor II

The error in the attachmentโ€‹

โ€‹

image.png

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group