Databricks Community

ChrisLawford · ‎07-03-2024

Hello,
I am trying to run PyTest from a notebook or python file that exists due to being deployed by a Databricks Asset Bundle (DAB).
I have a repository that contains a number of files with the end goal of trying to run PyTest in a directory to validate my code.
I shall explain the structure of the repo and the steps to reproduce the issue but in essence I am seeing different behavior from the same code when running in the `/Workspace/Repos/USER_EMAIL/REPO_NAME/NOTEBOOK_FILE` and running in `/Workspace/Users/USER_EMAIL/.bundle/BUNDLE_NAME/dev/files/NOTEBOOK_FILE`
When running in the repos folder I am able to run the NOTEBOOK_FILE that runs pytest and see the tests passing result. When running in the DAB folder I am able to run the NOTEBOOK_FILE that runs pytest and get the error

________________________ ERROR collecting spark_test.py ________________________
ImportError while importing test module '/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/spark_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'spark_test'

Files and folder Structure

The files and folder structure in the repo:

REPO_NAME/
├── execute_pytest.py
├── execute_pytest_nb.py
├── databricks.yml
└── spark_test.py

execute_pytest.py

import pytest
import os
import sys
    
# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
    
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

execute_pytest_nb.py

# Databricks notebook source
import pytest
import os
import sys

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
    
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

spark_test.py

from pyspark.sql import SparkSession
import pytest
@pytest.fixture
def spark() -> SparkSession:
  # Create a SparkSession (the entry point to Spark functionality) on
  # the cluster in the remote Databricks workspace. Unit tests do not
  # have access to this SparkSession by default.
  return SparkSession.builder.getOrCreate()


# COMMAND ----------

def test_scenario_a(spark):
    assert 1==1

databricks.yml

bundle:
  name: any-name-you-want

targets:
  # The 'dev' target, used for development purposes.
  # Whenever a developer deploys using 'dev', they get their own copy.
  dev:
    # We use 'mode: development' to make sure everything deployed to this target gets a prefix
    # like '[dev my_user_name]'. Setting this mode also disables any schedules and
    # automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
    mode: development
    default: true
    workspace:
      host: https://adb-XXXXXXXXXXXXXXXXX.azuredatabricks.net

Cluster Specs:

DBR: 14.3 LTS ML

Libraries: PyPI PyTest

Steps to reproduce working pytest in databricks repos:

Create the repo into the workspace at `/Workspace/Repos/USER_EMAIL/REPO_NAME/` location.
Open the "execute_pytest.py" file which should now exist at "/Workspace/Repos/USER_EMAIL/REPO_NAME/execute_pytest.py"
Attach the cluster and run all.

Steps to reproduce failing pytest in databricks DAB:

Clone the repo to your local computer
In the root of the repo open a terminal and run `databricks bundle deploy` (assuming you have databricks-cli already installed and configured for the workspace)
In the workspace navigate to the notebook "execute_pytest.py" which should now exist at "/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/execute_pytest.py"
Attach the cluster and run all.

Things that have been tried:

I have tested that the same outcome happens regardless of using a python file or a notebook. That is why the repo contains both "execute_pytest.py" (Python file) and "execute_pytest_nb.py" (Notebook).
Adding the CWD to the sys.path As referenced here . I have also tried this with the pytest.ini file As Referenced Here
I have tried different file names

ChrisLawford · ‎07-05-2024

Hello @Retired_mod,

Thankyou for your response. I am aware of what the error message means and that is exactly why I am requesting support. The same code deployed to two different locations in a workspace working differently is what I am trying to understand. Have you tried to replicate the issue ? I have supplied all of the necessary code to prove this.

I assume it will result in a pathing issue as I can rule out the directory structure being incorrect due to the code working when deployed to a Databricks Repo but not working when being deployed as a Databricks Asset Bundle.
I look forward to your response.

538014 · ‎10-21-2024

Hey, Chris. Did you ever get this working? Same issue here.

uzi49 · ‎10-21-2024

I think you need to wrap your code into a python wheel file: Develop a Python wheel file using Databricks Asset Bundles | Databricks on AWS

cinyoung · a month ago

@ChrisLawford
You can run pytest through job

databricks bundle run -t dev pytest_job

I was able to work around in this way.

resource/pytest.job.yml

resources:
  jobs:
    pytest_job:
      name: pytest_job

      tasks:
        - task_key: pytest_task
          notebook_task:
            notebook_path: src/pytest

src/pytest.ipynb

# pytest.main runs our tests directly in the notebook environment, providing
# fidelity for Spark and other configuration variables.
#
# A limitation of this approach is that changes to the test will be
# cache by Python's import caching mechanism.
#
# To iterate on tests during development, we restart the Python process 
# and thus clear the import cache to pick up changes.
dbutils.library.restartPython()

import pytest
import os
import sys

# Run all tests in the repository root.
notebook_path = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
repo_root = os.path.dirname(os.path.dirname(notebook_path))
os.chdir(f'/Workspace/{repo_root}')
%pwd

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True

retcode = pytest.main(["./tests/test_sample.py", "-p", "no:cacheprovider"])

# Fail the cell execution if we have any test failures.
assert retcode == 0, 'The pytest invocation failed. See the log above for details.'

tests/test_sample.py