cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pyspark RDD fails with pytest

danniely
New Contributor II

when I call RDD Apis during pytest, it seems like module "serializer.py" cannot find any other modules under pyspark.

I've already looked up on the internet, and it seems like pyspark modules are not properly importing other referring modules.

I see others who are experiencing a similar issue.

https://stackoverflow.com/questions/53863576/modulenotfounderror-because-pyspark-serializer-is-not-a...

I tried making a whole spark package into a zip file and load it when creating a spark session using addPyFile() method, but no use unfortunately.

Anyone could help me out with this?

1 REPLY 1

Anonymous
Not applicable

@hyunho lee​ : It sounds like you are encountering an issue with PySpark's serializer not being able to find the necessary modules during testing with Pytest. One solution you could try is to set the PYTHONPATH

environment variable to include the path to your PySpark installation before running Pytest. This can be done by adding the following line to your test script before running Pytest:

import os
os.environ['PYTHONPATH'] = '/path/to/pyspark'

Replace /path/to/pyspark with the actual path to your PySpark installation directory.

Another solution you could try is to use the PYSPARK_PYTHON environment variable to specify the Python executable to be used by PySpark. You can set this variable to the Python executable you used to install PySpark. For example:

import os
os.environ['PYSPARK_PYTHON'] = '/path/to/python'

Replace /path/to/python with the actual path to your Python executable.

I hope this helps! Let me know if you have any further questions.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group