cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unit Testing with PyTest in Databricks - ModuleNotFoundError

StephanKnox
New Contributor III

Dear all,

I am following the guide in this article: https://docs.databricks.com/en/notebooks/testing.html
however I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/deadmanhide@gmail.com/test_trans.py'

and

E ModuleNotFoundError: No module named 'test_trans'

 

My setup is:

Workspace:

run_tests.py (NoteBook where I install pytest and run pytest.main)

test_trans.py(Python file containing the unit tests)

transform (folder)

    -- operations.py(Notebook with transform and cleansing functions)

    -- __init__.py

I also tried to put test file and transform file inside the folders with __init__.py file so it would be treated as a package, also tried to do the same in Repos and not on workspace.

Clearly I am doing something wrong, would greatly appreciate any help,
Kind regards

4 REPLIES 4

StephanKnox
New Contributor III

Thank you very much for a detailed answer Kaniz! I have followed the steps you described and decided to simplify the structure further, now I have all 3 files at the same level and now I can collect my 2 tests.

The issue is with the module imports again, I have followed the guide from DataBricks regarding Unit Testing howecver it seems I am still missing something.

In my workspace I have two notebooks (transform_functions and run_tests) and a python file test_trans.py file, all at the same level.

The error I am getting is:


FAILED test_trans.py::test_check_columns_exist - ModuleNotFoundError: No module named 'transform_functions' FAILED test_trans.py::test_transform_replace_nulls - ModuleNotFoundError: No module named 'transform_functions'

def test_check_columns_exist(get_sparksession, get_test_df): > from transform_functions import * E ModuleNotFoundError: No module named 'transform_functions' test_trans.py:36: ModuleNotFoundError

 

tranform_functions is a notebook at the same level as test_trans.py so I am a bit confused as to why I am getting this error...
Full unit test function code which causes the error:

import pytest
import pyspark
from pyspark.sql import SparkSession, DataFrame
from pyspark.testing import assertDataFrameEqual, assertSchemaEqual
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, DateType, ArrayType


TESTFILE_PATH = 'file:/Workspace/Users/deadmanhide@gmail.com/test_data/testdata.json'

@pytest.fixture()
def get_sparksession():
    return SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()


@pytest.fixture()
def get_test_df(get_sparksession) -> DataFrame:
    spark = get_sparksession
    expected_schema = StructType([
    StructField("purchase_date", StringType(), True),
    StructField("customer_id", IntegerType(), True),
    StructField("amount", FloatType(), True),
    StructField("category", StringType(), True),
    StructField("city", StringType(), True),
    StructField("address", StructType([
        StructField("street_name", StringType(), True),
        StructField("street_number", IntegerType(), True),
        StructField("zip_code", StringType(), True)
    ]), True)])

    return spark.read.format('json').schema(expected_schema).load(TESTFILE_PATH)


def test_check_columns_exist(get_sparksession, get_test_df) :
    from transform_functions import *
    spark = get_sparksession
    test_df = get_test_df

    assert check_columns_exists(test_df, 'purchase_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is True
    assert check_columns_exists(test_df, 'sale_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is False

 

StephanKnox
New Contributor III

PS: I have restarted the cluster and ran my run_tests notebook again and now I am getting a different error:

E File "/Workspace/Repos/SBIT/SBIT/test_trans.py", line 36 E from transform_functions import * E ^ E SyntaxError: import * only allowed at module level

 

I am totally confused now …

akhtar
New Contributor II

Hi @Kaniz, I am also facing same issue "Unit Testing with PyTest in Databricks - ModuleNotFoundError". Can you please help me too. Because I am not able so see your reply in this thread.

saurabh18cs
Contributor II

Hi,

After trying a lot I could able to see some success , see if this is what you all are looking for :

notebook_test.py   (this is python code file)

from pyspark.sql import functions as F
def sum_values(df😞
    return df.agg(F.sum("value")).first()[0]
def reverse(s😞
    return s[::-1]
# Return the functions as a dictionary
# dbutils.notebook.exit({
#     "sum_values": sum_values,
#     "reverse": reverse
# })
 
 
test_sum (this is notebook , both sitting parallel to each other)
cmd1
!pip install pytest
 
cmd2
import pytest
import os
import sys
sys.dont_write_bytecode = True
os.chdir("/Workspace/Users/saurabh.............../")
 
 cmd3
import pytest
from pyspark.sql import SparkSession
from notebook_test import sum_values, reverse

# Run the notebook and import the functions
# notebook_path = "/Workspace/Users/saurabh................../notebook_test1"
# notebook_output = dbutils.notebook.run(notebook_path, 60)
# functions = eval(notebook_output)
# sum_values = functions["sum_values"]
# reverse = functions["reverse"]

@pytest.fixture(scope="module")
def spark():
    spark = SparkSession.builder \
        .appName("pytest-pyspark-local-testing") \
        .master("local[*]") \
        .getOrCreate()
    yield spark
    spark.stop()

def test_sum_values(spark😞
    data = [(1,), (2,), (3,)]
    df = spark.createDataFrame(data, ["value"])
    result = sum_values(df)
    assert result == 6

def test_reverse():
    assert reverse("hello") == "olleh"
    assert reverse("world") == "dlrow"
    assert reverse("") == ""
    assert reverse("a") == "a"
 
 cmd4
# In Databricks notebook
pytest.main(["-v"], plugins = [test_sum_values(spark)])

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group