Databricks Community

StephanKnox · ‎05-20-2024

Dear all,

I am following the guide in this article: https://docs.databricks.com/en/notebooks/testing.html
however I am unable to run pytest due to the following error: ImportError while importing test module '/Workspace/Users/deadmanhide@gmail.com/test_trans.py'

and

E ModuleNotFoundError: No module named 'test_trans'

My setup is:

Workspace:

run_tests.py (NoteBook where I install pytest and run pytest.main)

test_trans.py(Python file containing the unit tests)

transform (folder)

-- operations.py(Notebook with transform and cleansing functions)

-- __init__.py

I also tried to put test file and transform file inside the folders with __init__.py file so it would be treated as a package, also tried to do the same in Repos and not on workspace.

Clearly I am doing something wrong, would greatly appreciate any help,
Kind regards

StephanKnox · ‎05-22-2024

Thank you very much for a detailed answer Kaniz! I have followed the steps you described and decided to simplify the structure further, now I have all 3 files at the same level and now I can collect my 2 tests.

The issue is with the module imports again, I have followed the guide from DataBricks regarding Unit Testing howecver it seems I am still missing something.

In my workspace I have two notebooks (transform_functions and run_tests) and a python file test_trans.py file, all at the same level.

The error I am getting is:

FAILED test_trans.py::test_check_columns_exist - ModuleNotFoundError: No module named 'transform_functions' FAILED test_trans.py::test_transform_replace_nulls - ModuleNotFoundError: No module named 'transform_functions'

def test_check_columns_exist(get_sparksession, get_test_df): > from transform_functions import * E ModuleNotFoundError: No module named 'transform_functions' test_trans.py:36: ModuleNotFoundError

tranform_functions is a notebook at the same level as test_trans.py so I am a bit confused as to why I am getting this error...
Full unit test function code which causes the error:

import pytest

import pyspark

from pyspark.sql import SparkSession, DataFrame

from pyspark.testing import assertDataFrameEqual, assertSchemaEqual

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType, DateType, ArrayType

TESTFILE_PATH = 'file:/Workspace/Users/deadmanhide@gmail.com/test_data/testdata.json'

@pytest.fixture()

def get_sparksession():

return SparkSession.builder \

.appName('integrity-tests') \

.getOrCreate()

@pytest.fixture()

def get_test_df(get_sparksession) -> DataFrame:

spark = get_sparksession

expected_schema = StructType([

StructField("purchase_date", StringType(), True),

StructField("customer_id", IntegerType(), True),

StructField("amount", FloatType(), True),

StructField("category", StringType(), True),

StructField("city", StringType(), True),

StructField("address", StructType([

StructField("street_name", StringType(), True),

StructField("street_number", IntegerType(), True),

StructField("zip_code", StringType(), True)

]), True)])

return spark.read.format('json').schema(expected_schema).load(TESTFILE_PATH)

def test_check_columns_exist(get_sparksession, get_test_df) :

from transform_functions import *

spark = get_sparksession

test_df = get_test_df

assert check_columns_exists(test_df, 'purchase_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is True

assert check_columns_exists(test_df, 'sale_date', 'customer_id', 'amount', 'category', 'city', 'address', 'street_name', 'street_number', 'zip_code') is False

StephanKnox · ‎05-22-2024

PS: I have restarted the cluster and ran my run_tests notebook again and now I am getting a different error:

E File "/Workspace/Repos/SBIT/SBIT/test_trans.py", line 36 E from transform_functions import * E ^ E SyntaxError: import * only allowed at module level

I am totally confused now …

akhtar · ‎10-24-2024

Hi @Kaniz, I am also facing same issue "Unit Testing with PyTest in Databricks - ModuleNotFoundError". Can you please help me too. Because I am not able so see your reply in this thread.

saurabh18cs · ‎10-25-2024

Hi,

After trying a lot I could able to see some success , see if this is what you all are looking for :

notebook_test.py (this is python code file)

from pyspark.sql import functions as F

def sum_values(df😞

return df.agg(F.sum("value")).first()[0]

def reverse(s😞

return s[::-1]

# Return the functions as a dictionary

# dbutils.notebook.exit({

# "sum_values": sum_values,

# "reverse": reverse

# })

test_sum (this is notebook , both sitting parallel to each other)

cmd1

!pip install pytest

cmd2

import pytest

import os

import sys

sys.dont_write_bytecode = True

os.chdir("/Workspace/Users/saurabh.............../")

cmd3

import pytest

from pyspark.sql import SparkSession

from notebook_test import sum_values, reverse

# Run the notebook and import the functions

# notebook_path = "/Workspace/Users/saurabh................../notebook_test1"

# notebook_output = dbutils.notebook.run(notebook_path, 60)

# functions = eval(notebook_output)

# sum_values = functions["sum_values"]

# reverse = functions["reverse"]

@pytest.fixture(scope="module")

def spark():

spark = SparkSession.builder \

.appName("pytest-pyspark-local-testing") \

.master("local[*]") \

.getOrCreate()

yield spark

spark.stop()

def test_sum_values(spark😞

data = [(1,), (2,), (3,)]

df = spark.createDataFrame(data, ["value"])

result = sum_values(df)

assert result == 6

def test_reverse():

assert reverse("hello") == "olleh"

assert reverse("world") == "dlrow"

assert reverse("") == ""

assert reverse("a") == "a"

cmd4

# In Databricks notebook

pytest.main(["-v"], plugins = [test_sum_values(spark)])

Databricks Community

Unit Testing with PyTest in Databricks - ModuleNotFoundError

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!