cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unit Testing with PyTest in Databricks - ModuleNotFoundError

StephanKnox
New Contributor II

Dear all,

I am following the guide in this article: https://docs.databricks.com/en/notebooks/testing.html
however I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/deadmanhide@gmail.com/test_trans.py'

and

E ModuleNotFoundError: No module named 'test_trans'

 

My setup is:

Workspace:

run_tests.py (NoteBook where I install pytest and run pytest.main)

test_trans.py(Python file containing the unit tests)

transform (folder)

    -- operations.py(Notebook with transform and cleansing functions)

    -- __init__.py

I also tried to put test file and transform file inside the folders with __init__.py file so it would be treated as a package, also tried to do the same in Repos and not on workspace.

Clearly I am doing something wrong, would greatly appreciate any help,
Kind regards

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @StephanKnoxEnsure that your directory structure is set up correctly. Based on your description, it should look something like this:

Workspace/
├── run_tests.py
├── test_trans.py
└── transform/
    ├── operations.py
    └── __init__.py
  • In both the transform folder and the root directory, add an empty __init__.py file (if you haven’t already). This makes Python recognize these directories as packages.
  • You mentioned trying this, but let’s make sure it’s done correctly.
  • Sometimes, pytest may not discover your modules due to path issues.
  • Try one of the following:
    • Define the PYTHONPATH environment variable to include the root directory of your project:
      export PYTHONPATH=/path/to/Workspace
      
    • Run pytest using python -m pytest instead of just pytest. This can help pytest find your modules correctly.
  • Navigate to the root directory (where run_tests.py and test_trans.py are located).
  • Run pytest from there:
  • python -m pytest
    

 

  • In your test_trans.py file, ensure that you’re importing the necessary modules correctly.
  • For example, if you’re trying to import something from transform.operations, use:-
  • from transform.operations import my_function
    
  • Make sure you’re using the correct Python version (the same one you used to install pytest).
  • If you haven’t already, install pytest in your workspace:
    !pip install pytest
    
  •  Sometimes, cached imports can cause issues. Try running pytest with the --cache-clear flag:
python -m pytest --cache-clear

 If you encounter any further problems, feel free to ask for more assistance! 😊

  •  

StephanKnox
New Contributor II

Thank you very much for a detailed answer Kaniz! I have followed the steps you described and decided to simplify the structure further, now I have all 3 files at the same level and now I can collect my 2 tests.

The issue is with the module imports again, I have followed the guide from DataBricks regarding Unit Testing howecver it seems I am still missing something.

In my workspace I have two notebooks (transform_functions and run_tests) and a python file test_trans.py file, all at the same level.

The error I am getting is:


FAILED test_trans.py::test_check_columns_exist - ModuleNotFoundError: No module named 'transform_functions' FAILED test_trans.py::test_transform_replace_nulls - ModuleNotFoundError: No module named 'transform_functions'

def test_check_columns_exist(get_sparksession, get_test_df): > from transform_functions import * E ModuleNotFoundError: No module named 'transform_functions' test_trans.py:36: ModuleNotFoundError

 

tranform_functions is a notebook at the same level as test_trans.py so I am a bit confused as to why I am getting this error...
Full unit test function code which causes the error:

import pytest
import pyspark
from pyspark.sql import SparkSession, DataFrame
from pyspark.testing import assertDataFrameEqual, assertSchemaEqual
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, DateType, ArrayType


TESTFILE_PATH = 'file:/Workspace/Users/deadmanhide@gmail.com/test_data/testdata.json'

@pytest.fixture()
def get_sparksession():
    return SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()


@pytest.fixture()
def get_test_df(get_sparksession) -> DataFrame:
    spark = get_sparksession
    expected_schema = StructType([
    StructField("purchase_date", StringType(), True),
    StructField("customer_id", IntegerType(), True),
    StructField("amount", FloatType(), True),
    StructField("category", StringType(), True),
    StructField("city", StringType(), True),
    StructField("address", StructType([
        StructField("street_name", StringType(), True),
        StructField("street_number", IntegerType(), True),
        StructField("zip_code", StringType(), True)
    ]), True)])

    return spark.read.format('json').schema(expected_schema).load(TESTFILE_PATH)


def test_check_columns_exist(get_sparksession, get_test_df) :
    from transform_functions import *
    spark = get_sparksession
    test_df = get_test_df

    assert check_columns_exists(test_df, 'purchase_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is True
    assert check_columns_exists(test_df, 'sale_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is False

 

StephanKnox
New Contributor II

PS: I have restarted the cluster and ran my run_tests notebook again and now I am getting a different error:

E File "/Workspace/Repos/SBIT/SBIT/test_trans.py", line 36 E from transform_functions import * E ^ E SyntaxError: import * only allowed at module level

 

I am totally confused now …

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!