cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

pytest unit testing help

ChristianRRL
Honored Contributor

Hi there, I am hoping someone can help me understand why I'm having issues with a simple pytest unit test... test?

When attempting to run a `run_tests_notebook` in our all-purpose compute cluster (Runtime 14.3) with the following:

import pytest
import os
import sys

# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])

# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

I get the following error message:

============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-9.0.2, pluggy-1.6.0 -- /local_disk0/.ephemeral_nfs/envs/pythonEnv-9d50e02c-01b0-47de-95e6-456ef5c86d1c/bin/python
rootdir: /Workspace/Users/username@company.com/sandbox/unit_test_sample
plugins: anyio-3.5.0
collecting ... collected 0 items / 1 error

==================================== ERRORS ====================================
____________________ ERROR collecting test_my_functions.py _____________________
ImportError while importing test module '/Workspace/Users/username@company.com/sandbox/unit_test_sample/test_my_functions.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'test_my_functions'
=========================== short test summary info ============================
ERROR test_my_functions.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 0.52s ===============================
AssertionError: The pytest invocation failed. See the log for details.

However, when I switch to the Serverless cluster and attempt to `Run tests` directly on my `test_my_functions.py` file (feature not available in Runtime 14.3), I am seeing that my unit tests are passing successfully:

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Hi there, I wanted to briefly follow-up on this. From what I'm seeing, the error is due to pytest trying to write __pycache__ under /Workspace/..., and that filesystem doesn’t support that write mode.

In order to resolve this, I tried adding two command line arguments:

  • PYTHONDONTWRITEBYTECODE=1
  • --assert=plain
(I also added -p no:cacheprovider, but this is only to avoid writing out the .pytest_cache folder in my Workspace)
 
After adding these CLI args in pytest, I am finally seeing successful tests!
 
The same fix works in the equivalent pytest python logic, including: 
  • sys.dont_write_bytecode = True
  • result = pytest.main(["--assert=plain", "-p", "no:cacheprovider", "-q"])
 
 
See attached examples.
 

View solution in original post

4 REPLIES 4

ChristianRRL
Honored Contributor

My serverless successful unit test picture didn't go through for some reason in my original post:

serverless-successful-unit-test.png

wesleyfelipe
Contributor

Hi @ChristianRRL !

If you have your scripts in a regulat Workspace folder you might need to add the script you intend to import to the sys path:

 

import sys
sys.path.append("/Workspace/Users/your_email/my_project")

 

But if you have your code in a Databricks Repo, that should not be required if you are running on Databricks Runtime 14.3 and above (check here).

 

Hi there, I wanted to briefly follow-up on this. From what I'm seeing, the error is due to pytest trying to write __pycache__ under /Workspace/..., and that filesystem doesn’t support that write mode.

In order to resolve this, I tried adding two command line arguments:

  • PYTHONDONTWRITEBYTECODE=1
  • --assert=plain
(I also added -p no:cacheprovider, but this is only to avoid writing out the .pytest_cache folder in my Workspace)
 
After adding these CLI args in pytest, I am finally seeing successful tests!
 
The same fix works in the equivalent pytest python logic, including: 
  • sys.dont_write_bytecode = True
  • result = pytest.main(["--assert=plain", "-p", "no:cacheprovider", "-q"])
 
 
See attached examples.
 

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @ChristianRRL,

Your follow-up solution is spot on. To summarize and add some context for others who land here:

THE ROOT CAUSE

The ModuleNotFoundError you initially saw happens because the Workspace filesystem (/Workspace/...) does not fully support certain write operations that Python and pytest rely on by default. Specifically:

1. pytest's assertion rewriting mechanism tries to write compiled .pyc files into __pycache__ directories, which fails on the Workspace filesystem.
2. The cache provider plugin tries to write a .pytest_cache folder, which also fails.

When either of these writes fails, pytest cannot properly collect and import your test modules, which surfaces as the misleading "ModuleNotFoundError: No module named 'test_my_functions'" error.

THE FIX

As you discovered, two settings resolve this. The official Databricks documentation for running pytest in notebooks actually recommends both of these:

import pytest
import sys

# Prevent Python from writing .pyc files to the read-only workspace filesystem
sys.dont_write_bytecode = True

# Run pytest with assertion rewriting disabled and cache provider off
retcode = pytest.main([".", "-v", "--assert=plain", "-p", "no:cacheprovider"])

assert retcode == 0, "The pytest invocation failed. See the log for details."

Here is what each setting does:

sys.dont_write_bytecode = True
Prevents Python from creating __pycache__/*.pyc files.

--assert=plain
Disables pytest's assertion rewriting, which also requires writing .pyc files.

-p no:cacheprovider
Disables the .pytest_cache directory that pytest uses to track test state between runs.

WHY SERVERLESS WORKS WITHOUT THESE FLAGS

On Serverless compute, the "Run tests" button uses a built-in test runner that is already configured to handle these filesystem constraints. That is why your tests passed on Serverless without any extra arguments.

ADDITIONAL TIPS FOR PYTEST IN DATABRICKS

1. sys.path for Workspace folders: As another commenter mentioned, if your source code is in a regular Workspace folder (not a Git folder/Repo), you may need to add its path to sys.path so pytest can find your modules:

import sys
sys.path.append("/Workspace/Users/your_email/your_project")

For code stored in Git folders (Repos), this is handled automatically on Runtime 14.3 and above.

2. Organize tests alongside source files: Databricks recommends storing functions and tests as Python files (.py) rather than inside notebooks, since test frameworks work best with standard Python files.

3. Documentation reference: The Databricks docs have a full walkthrough of running pytest in notebooks here:
https://docs.databricks.com/en/notebooks/testing.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.