cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

pytest error

hong
New Contributor II

Hello,

I have a quick question. If my source code call pysark collect() or any method related to rdd methods, then pytest on my local PC will report the following error. My local machine doesn't have any specific setting for pyspark and I used findspark package. If you know the solution it will be greatly appreciated. Thanks. 

src\tests\test_calculate_psi_for_each_column.py:14: in <module>
spark = spu.create_spark_obj()
src\prototype\supplementary_utils.py:959: in create_spark_obj
spark = SparkSession.builder.appName("ABC").getOrCreate()
venv\lib\site-packages\pyspark\sql\session.py:269: in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
venv\lib\site-packages\pyspark\context.py:483: in getOrCreate
SparkContext(conf=conf or SparkConf())
venv\lib\site-packages\pyspark\context.py:197: in __init__
self._do_init(
venv\lib\site-packages\pyspark\context.py:282: in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
venv\lib\site-packages\pyspark\context.py:402: in _initialize_context
return self._jvm.JavaSparkContext(jconf)
venv\lib\site-packages\py4j\java_gateway.py:1585: in __call__
return_value = get_return_value(
venv\lib\site-packages\py4j\protocol.py:326: in get_return_value
raise Py4JJavaError(
E py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
E : java.lang.ExceptionInInitializerError
E at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:56)
E at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
E at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
E at org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
E at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
E at scala.Option.getOrElse(Option.scala:189)
E at org.apache.spark.memory.MemoryManager.<init>(MemoryManager.scala:273)
E at org.apache.spark.memory.UnifiedMemoryManager.<init>(UnifiedMemoryManager.scala:58)
E at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
E at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
E at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
E at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
E at org.apache.spark.SparkContext.<init>(SparkContext.scala:464)
E at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
E at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
E at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
E at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E at py4j.Gateway.invoke(Gateway.java:238)
E at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
E at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E at java.base/java.lang.Thread.run(Thread.java:1570)
E Caused by: java.lang.IllegalStateException: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int)
E at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:113)
E ... 25 more
E Caused by: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int)
E at java.base/java.lang.Class.getConstructor0(Class.java:3784)
E at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2955)
E at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:71)
E ... 25 more

4 REPLIES 4

brockb
Valued Contributor

Hi,

The error message and stack trace doesn't seem to suggest that this is a failed pytest issue. If that's true, can you please try to replicate what's being invoked from `src\tests\test_calculate_psi_for_each_column.py` in a Python REPL?

And separately, have you confirmed that the pyspark installation was successful? Depending on how you installed pyspark, are you able to start pyspark by itself (independent of `findspark`) such as `$SPARK_HOME/bin/pyspark`?

Thanks.

hong
New Contributor II

When I directly run pyspark, it failed.

C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\bin\pyspark
Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
24/05/27 20:40:38 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/05/27 20:40:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/05/27 20:40:40 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
java.base/java.lang.Thread.run(Thread.java:1570)
C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\bin\..\python\pyspark\shell.py:44: UserWarning: Failed to initialize Spark session.
warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\bin\..\python\pyspark\shell.py", line 39, in <module>
spark = SparkSession._create_shell_session()
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\sql\session.py", line 677, in _create_shell_session
return SparkSession._getActiveSessionOrCreate()
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\sql\session.py", line 693, in _getActiveSessionOrCreate
spark = builder.getOrCreate()
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\sql\session.py", line 269, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\context.py", line 483, in getOrCreate SparkContext(conf=conf or SparkConf())
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\context.py", line 197, in __init__
self._do_init(
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\context.py", line 282, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\context.py", line 402, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\python\lib\py4j-0.10.9.5-src.zip\py4j\java_gateway.py", line 1585, in __call__
return_value = get_return_value(
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\python\lib\py4j-0.10.9.5-src.zip\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.unsafe.array.ByteArrayMethods
at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
at org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.memory.MemoryManager.<init>(MemoryManager.scala:273)
at org.apache.spark.memory.UnifiedMemoryManager.<init>(UnifiedMemoryManager.scala:58)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:464)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.ExceptionInInitializerError [in thread "Thread-2"]
at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:56)
... 24 more

ERROR: The process with PID 33248 (child process of PID 43512) could not be terminated.
Reason: Access is denied.
SUCCESS: The process with PID 43512 (child process of PID 14852) has been terminated.
SUCCESS: The process with PID 14852 (child process of PID 42180) has been terminated.

brockb
Valued Contributor

I dont personally have any experience running Spark on Windows. Can you please review the Wiki article referenced in the WARN message to see if it helps you complete the installation successfully? Or alternatively, could you consider running the tests on Databricks if you continue having issues with the Windows setup?

Thanks.

hong
New Contributor II

Thank you very much, brockb. Probably I will try it in databricks. Thanks.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group