Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Showing results for 
Search instead for 
Did you mean: 

Unit Testing with PyTest in Databricks - ModuleNotFoundError

New Contributor II

Dear all,

I am following the guide in this article:
however I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/'


E ModuleNotFoundError: No module named 'test_trans'


My setup is:

Workspace: (NoteBook where I install pytest and run pytest.main) file containing the unit tests)

transform (folder)

    -- with transform and cleansing functions)


I also tried to put test file and transform file inside the folders with file so it would be treated as a package, also tried to do the same in Repos and not on workspace.

Clearly I am doing something wrong, would greatly appreciate any help,
Kind regards


Community Manager
Community Manager

Hi @StephanKnoxEnsure that your directory structure is set up correctly. Based on your description, it should look something like this:

└── transform/
  • In both the transform folder and the root directory, add an empty file (if you haven’t already). This makes Python recognize these directories as packages.
  • You mentioned trying this, but let’s make sure it’s done correctly.
  • Sometimes, pytest may not discover your modules due to path issues.
  • Try one of the following:
    • Define the PYTHONPATH environment variable to include the root directory of your project:
      export PYTHONPATH=/path/to/Workspace
    • Run pytest using python -m pytest instead of just pytest. This can help pytest find your modules correctly.
  • Navigate to the root directory (where and are located).
  • Run pytest from there:
  • python -m pytest


  • In your file, ensure that you’re importing the necessary modules correctly.
  • For example, if you’re trying to import something from transform.operations, use:-
  • from transform.operations import my_function
  • Make sure you’re using the correct Python version (the same one you used to install pytest).
  • If you haven’t already, install pytest in your workspace:
    !pip install pytest
  •  Sometimes, cached imports can cause issues. Try running pytest with the --cache-clear flag:
python -m pytest --cache-clear

 If you encounter any further problems, feel free to ask for more assistance! 😊


New Contributor II

Thank you very much for a detailed answer Kaniz! I have followed the steps you described and decided to simplify the structure further, now I have all 3 files at the same level and now I can collect my 2 tests.

The issue is with the module imports again, I have followed the guide from DataBricks regarding Unit Testing howecver it seems I am still missing something.

In my workspace I have two notebooks (transform_functions and run_tests) and a python file file, all at the same level.

The error I am getting is:

FAILED - ModuleNotFoundError: No module named 'transform_functions' FAILED - ModuleNotFoundError: No module named 'transform_functions'

def test_check_columns_exist(get_sparksession, get_test_df): > from transform_functions import * E ModuleNotFoundError: No module named 'transform_functions' ModuleNotFoundError


tranform_functions is a notebook at the same level as so I am a bit confused as to why I am getting this error...
Full unit test function code which causes the error:

import pytest
import pyspark
from pyspark.sql import SparkSession, DataFrame
from pyspark.testing import assertDataFrameEqual, assertSchemaEqual
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, DateType, ArrayType

TESTFILE_PATH = 'file:/Workspace/Users/'

def get_sparksession():
    return SparkSession.builder \
                    .appName('integrity-tests') \

def get_test_df(get_sparksession) -> DataFrame:
    spark = get_sparksession
    expected_schema = StructType([
    StructField("purchase_date", StringType(), True),
    StructField("customer_id", IntegerType(), True),
    StructField("amount", FloatType(), True),
    StructField("category", StringType(), True),
    StructField("city", StringType(), True),
    StructField("address", StructType([
        StructField("street_name", StringType(), True),
        StructField("street_number", IntegerType(), True),
        StructField("zip_code", StringType(), True)
    ]), True)])


def test_check_columns_exist(get_sparksession, get_test_df) :
    from transform_functions import *
    spark = get_sparksession
    test_df = get_test_df

    assert check_columns_exists(test_df, 'purchase_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is True
    assert check_columns_exists(test_df, 'sale_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is False


New Contributor II

PS: I have restarted the cluster and ran my run_tests notebook again and now I am getting a different error:

E File "/Workspace/Repos/SBIT/SBIT/", line 36 E from transform_functions import * E ^ E SyntaxError: import * only allowed at module level


I am totally confused now …

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!