cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

pytest error

hong
New Contributor II

Hello,

I have a quick question. If my source code call pysark collect() or any method related to rdd methods, then pytest on my local PC will report the following error. My local machine doesn't have any specific setting for pyspark and I used findspark package. If you know the solution it will be greatly appreciated. Thanks. 

src\tests\test_calculate_psi_for_each_column.py:14: in <module>
spark = spu.create_spark_obj()
src\prototype\supplementary_utils.py:959: in create_spark_obj
spark = SparkSession.builder.appName("ABC").getOrCreate()
venv\lib\site-packages\pyspark\sql\session.py:269: in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
venv\lib\site-packages\pyspark\context.py:483: in getOrCreate
SparkContext(conf=conf or SparkConf())
venv\lib\site-packages\pyspark\context.py:197: in __init__
self._do_init(
venv\lib\site-packages\pyspark\context.py:282: in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
venv\lib\site-packages\pyspark\context.py:402: in _initialize_context
return self._jvm.JavaSparkContext(jconf)
venv\lib\site-packages\py4j\java_gateway.py:1585: in __call__
return_value = get_return_value(
venv\lib\site-packages\py4j\protocol.py:326: in get_return_value
raise Py4JJavaError(
E py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
E : java.lang.ExceptionInInitializerError
E at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:56)
E at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
E at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
E at org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
E at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
E at scala.Option.getOrElse(Option.scala:189)
E at org.apache.spark.memory.MemoryManager.<init>(MemoryManager.scala:273)
E at org.apache.spark.memory.UnifiedMemoryManager.<init>(UnifiedMemoryManager.scala:58)
E at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
E at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
E at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
E at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
E at org.apache.spark.SparkContext.<init>(SparkContext.scala:464)
E at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
E at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
E at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
E at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
E at py4j.Gateway.invoke(Gateway.java:238)
E at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
E at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E at java.base/java.lang.Thread.run(Thread.java:1570)
E Caused by: java.lang.IllegalStateException: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int)
E at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:113)
E ... 25 more
E Caused by: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int)
E at java.base/java.lang.Class.getConstructor0(Class.java:3784)
E at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2955)
E at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:71)
E ... 25 more

4 REPLIES 4

brockb
Contributor III
Contributor III

Hi,

The error message and stack trace doesn't seem to suggest that this is a failed pytest issue. If that's true, can you please try to replicate what's being invoked from `src\tests\test_calculate_psi_for_each_column.py` in a Python REPL?

And separately, have you confirmed that the pyspark installation was successful? Depending on how you installed pyspark, are you able to start pyspark by itself (independent of `findspark`) such as `$SPARK_HOME/bin/pyspark`?

Thanks.

hong
New Contributor II

When I directly run pyspark, it failed.

C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\bin\pyspark
Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
24/05/27 20:40:38 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/05/27 20:40:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/05/27 20:40:40 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext should be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
py4j.ClientServerConnection.run(ClientServerConnection.java:106)
java.base/java.lang.Thread.run(Thread.java:1570)
C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\bin\..\python\pyspark\shell.py:44: UserWarning: Failed to initialize Spark session.
warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\bin\..\python\pyspark\shell.py", line 39, in <module>
spark = SparkSession._create_shell_session()
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\sql\session.py", line 677, in _create_shell_session
return SparkSession._getActiveSessionOrCreate()
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\sql\session.py", line 693, in _getActiveSessionOrCreate
spark = builder.getOrCreate()
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\sql\session.py", line 269, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\context.py", line 483, in getOrCreate SparkContext(conf=conf or SparkConf())
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\context.py", line 197, in __init__
self._do_init(
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\context.py", line 282, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\lib\site-packages\pyspark\context.py", line 402, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\python\lib\py4j-0.10.9.5-src.zip\py4j\java_gateway.py", line 1585, in __call__
return_value = get_return_value(
File "C:\Users\HXI1\hx_scripts\core-globaldata-dpm-mlops-pipeline-1\venv\Lib\site-packages\pyspark\python\lib\py4j-0.10.9.5-src.zip\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.unsafe.array.ByteArrayMethods
at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
at org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.memory.MemoryManager.<init>(MemoryManager.scala:273)
at org.apache.spark.memory.UnifiedMemoryManager.<init>(UnifiedMemoryManager.scala:58)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:464)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.ExceptionInInitializerError [in thread "Thread-2"]
at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:56)
... 24 more

ERROR: The process with PID 33248 (child process of PID 43512) could not be terminated.
Reason: Access is denied.
SUCCESS: The process with PID 43512 (child process of PID 14852) has been terminated.
SUCCESS: The process with PID 14852 (child process of PID 42180) has been terminated.

brockb
Contributor III
Contributor III

I dont personally have any experience running Spark on Windows. Can you please review the Wiki article referenced in the WARN message to see if it helps you complete the installation successfully? Or alternatively, could you consider running the tests on Databricks if you continue having issues with the Windows setup?

Thanks.

hong
New Contributor II

Thank you very much, brockb. Probably I will try it in databricks. Thanks.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!