Unit Testing with PyTest in Databricks - ModuleNotFoundError

StephanKnox
New Contributor III

Dear all,

I am following the guide in this article: https://docs.databricks.com/en/notebooks/testing.html
however I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/deadmanhide@gmail.com/test_trans.py'

and

E ModuleNotFoundError: No module named 'test_trans'

 

My setup is:

Workspace:

run_tests.py (NoteBook where I install pytest and run pytest.main)

test_trans.py(Python file containing the unit tests)

transform (folder)

    -- operations.py(Notebook with transform and cleansing functions)

    -- __init__.py

I also tried to put test file and transform file inside the folders with __init__.py file so it would be treated as a package, also tried to do the same in Repos and not on workspace.

Clearly I am doing something wrong, would greatly appreciate any help,
Kind regards

StephanKnox
New Contributor III

Thank you very much for a detailed answer Kaniz! I have followed the steps you described and decided to simplify the structure further, now I have all 3 files at the same level and now I can collect my 2 tests.

The issue is with the module imports again, I have followed the guide from DataBricks regarding Unit Testing howecver it seems I am still missing something.

In my workspace I have two notebooks (transform_functions and run_tests) and a python file test_trans.py file, all at the same level.

The error I am getting is:


FAILED test_trans.py::test_check_columns_exist - ModuleNotFoundError: No module named 'transform_functions' FAILED test_trans.py::test_transform_replace_nulls - ModuleNotFoundError: No module named 'transform_functions'

def test_check_columns_exist(get_sparksession, get_test_df): > from transform_functions import * E ModuleNotFoundError: No module named 'transform_functions' test_trans.py:36: ModuleNotFoundError

 

tranform_functions is a notebook at the same level as test_trans.py so I am a bit confused as to why I am getting this error...
Full unit test function code which causes the error:

import pytest
import pyspark
from pyspark.sql import SparkSession, DataFrame
from pyspark.testing import assertDataFrameEqual, assertSchemaEqual
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, DateType, ArrayType


TESTFILE_PATH = 'file:/Workspace/Users/deadmanhide@gmail.com/test_data/testdata.json'

@pytest.fixture()
def get_sparksession():
    return SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()


@pytest.fixture()
def get_test_df(get_sparksession) -> DataFrame:
    spark = get_sparksession
    expected_schema = StructType([
    StructField("purchase_date", StringType(), True),
    StructField("customer_id", IntegerType(), True),
    StructField("amount", FloatType(), True),
    StructField("category", StringType(), True),
    StructField("city", StringType(), True),
    StructField("address", StructType([
        StructField("street_name", StringType(), True),
        StructField("street_number", IntegerType(), True),
        StructField("zip_code", StringType(), True)
    ]), True)])

    return spark.read.format('json').schema(expected_schema).load(TESTFILE_PATH)


def test_check_columns_exist(get_sparksession, get_test_df) :
    from transform_functions import *
    spark = get_sparksession
    test_df = get_test_df

    assert check_columns_exists(test_df, 'purchase_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is True
    assert check_columns_exists(test_df, 'sale_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is False

 

StephanKnox
New Contributor III

PS: I have restarted the cluster and ran my run_tests notebook again and now I am getting a different error:

E File "/Workspace/Repos/SBIT/SBIT/test_trans.py", line 36 E from transform_functions import * E ^ E SyntaxError: import * only allowed at module level

 

I am totally confused now …

akhtar
New Contributor II

Hi @Kaniz, I am also facing same issue "Unit Testing with PyTest in Databricks - ModuleNotFoundError". Can you please help me too. Because I am not able so see your reply in this thread.

saurabh18cs
Honored Contributor III

Hi,

After trying a lot I could able to see some success , see if this is what you all are looking for :

notebook_test.py   (this is python code file)

from pyspark.sql import functions as F
def sum_values(df😞
    return df.agg(F.sum("value")).first()[0]
def reverse(s😞
    return s[::-1]
# Return the functions as a dictionary
# dbutils.notebook.exit({
#     "sum_values": sum_values,
#     "reverse": reverse
# })
 
 
test_sum (this is notebook , both sitting parallel to each other)
cmd1
!pip install pytest
 
cmd2
import pytest
import os
import sys
sys.dont_write_bytecode = True
os.chdir("/Workspace/Users/saurabh.............../")
 
 cmd3
import pytest
from pyspark.sql import SparkSession
from notebook_test import sum_values, reverse

# Run the notebook and import the functions
# notebook_path = "/Workspace/Users/saurabh................../notebook_test1"
# notebook_output = dbutils.notebook.run(notebook_path, 60)
# functions = eval(notebook_output)
# sum_values = functions["sum_values"]
# reverse = functions["reverse"]

@pytest.fixture(scope="module")
def spark():
    spark = SparkSession.builder \
        .appName("pytest-pyspark-local-testing") \
        .master("local[*]") \
        .getOrCreate()
    yield spark
    spark.stop()

def test_sum_values(spark😞
    data = [(1,), (2,), (3,)]
    df = spark.createDataFrame(data, ["value"])
    result = sum_values(df)
    assert result == 6

def test_reverse():
    assert reverse("hello") == "olleh"
    assert reverse("world") == "dlrow"
    assert reverse("") == ""
    assert reverse("a") == "a"
 
 cmd4
# In Databricks notebook
pytest.main(["-v"], plugins = [test_sum_values(spark)])