Unit Testing with PyTest in Databricks - ModuleNotFoundError

New Contributor II

Dear all,

I am following the guide in this article:
however I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/'


E ModuleNotFoundError: No module named 'test_trans'


My setup is:

Workspace: (NoteBook where I install pytest and run pytest.main) file containing the unit tests)

transform (folder)

    -- with transform and cleansing functions)


I also tried to put test file and transform file inside the folders with file so it would be treated as a package, also tried to do the same in Repos and not on workspace.

Clearly I am doing something wrong, would greatly appreciate any help,
Kind regards


Community Manager

Hi @StephanKnoxEnsure that your directory structure is set up correctly. Based on your description, it should look something like this:

└── transform/
  • In both the transform folder and the root directory, add an empty file (if you haven’t already). This makes Python recognize these directories as packages.
  • You mentioned trying this, but let’s make sure it’s done correctly.
  • Sometimes, pytest may not discover your modules due to path issues.
  • Try one of the following:
    • Define the PYTHONPATH environment variable to include the root directory of your project:
      export PYTHONPATH=/path/to/Workspace
    • Run pytest using python -m pytest instead of just pytest. This can help pytest find your modules correctly.
  • Navigate to the root directory (where and are located).
  • Run pytest from there:
  • python -m pytest


  • In your file, ensure that you’re importing the necessary modules correctly.
  • For example, if you’re trying to import something from transform.operations, use:-
  • from transform.operations import my_function
  • Make sure you’re using the correct Python version (the same one you used to install pytest).
  • If you haven’t already, install pytest in your workspace:
    !pip install pytest
  •  Sometimes, cached imports can cause issues. Try running pytest with the --cache-clear flag:
python -m pytest --cache-clear

 If you encounter any further problems, feel free to ask for more assistance! 😊


New Contributor II

Thank you very much for a detailed answer Kaniz! I have followed the steps you described and decided to simplify the structure further, now I have all 3 files at the same level and now I can collect my 2 tests.

The issue is with the module imports again, I have followed the guide from DataBricks regarding Unit Testing howecver it seems I am still missing something.

In my workspace I have two notebooks (transform_functions and run_tests) and a python file file, all at the same level.

The error I am getting is:

FAILED - ModuleNotFoundError: No module named 'transform_functions' FAILED - ModuleNotFoundError: No module named 'transform_functions'

def test_check_columns_exist(get_sparksession, get_test_df): > from transform_functions import * E ModuleNotFoundError: No module named 'transform_functions' ModuleNotFoundError


tranform_functions is a notebook at the same level as so I am a bit confused as to why I am getting this error...
Full unit test function code which causes the error:

import pytest
import pyspark
from pyspark.sql import SparkSession, DataFrame
from pyspark.testing import assertDataFrameEqual, assertSchemaEqual
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, DateType, ArrayType

TESTFILE_PATH = 'file:/Workspace/Users/'

def get_sparksession():
    return SparkSession.builder \
                    .appName('integrity-tests') \

def get_test_df(get_sparksession) -> DataFrame:
    spark = get_sparksession
    expected_schema = StructType([
    StructField("purchase_date", StringType(), True),
    StructField("customer_id", IntegerType(), True),
    StructField("amount", FloatType(), True),
    StructField("category", StringType(), True),
    StructField("city", StringType(), True),
    StructField("address", StructType([
        StructField("street_name", StringType(), True),
        StructField("street_number", IntegerType(), True),
        StructField("zip_code", StringType(), True)
    ]), True)])


def test_check_columns_exist(get_sparksession, get_test_df) :
    from transform_functions import *
    spark = get_sparksession
    test_df = get_test_df

    assert check_columns_exists(test_df, 'purchase_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is True
    assert check_columns_exists(test_df, 'sale_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is False


New Contributor II

PS: I have restarted the cluster and ran my run_tests notebook again and now I am getting a different error:

E File "/Workspace/Repos/SBIT/SBIT/", line 36 E from transform_functions import * E ^ E SyntaxError: import * only allowed at module level


I am totally confused now …

