PyTest working in Repos but not in Databricks Asset Bundles
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ā07-03-2024 04:46 AM
Hello,
I am trying to run PyTest from a notebook or python file that exists due to being deployed by a Databricks Asset Bundle (DAB).
I have a repository that contains a number of files with the end goal of trying to run PyTest in a directory to validate my code.
I shall explain the structure of the repo and the steps to reproduce the issue but in essence I am seeing different behavior from the same code when running in the `/Workspace/Repos/USER_EMAIL/REPO_NAME/NOTEBOOK_FILE` and running in `/Workspace/Users/USER_EMAIL/.bundle/BUNDLE_NAME/dev/files/NOTEBOOK_FILE`
When running in the repos folder I am able to run the NOTEBOOK_FILE that runs pytest and see the tests passing result. When running in the DAB folder I am able to run the NOTEBOOK_FILE that runs pytest and get the error
________________________ ERROR collecting spark_test.py ________________________
ImportError while importing test module '/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/spark_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
E ModuleNotFoundError: No module named 'spark_test'
Files and folder Structure
The files and folder structure in the repo:
REPO_NAME/
āāā execute_pytest.py
āāā execute_pytest_nb.py
āāā databricks.yml
āāā spark_test.py
execute_pytest.py
import pytest
import os
import sys
# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."
execute_pytest_nb.py
# Databricks notebook source
import pytest
import os
import sys
# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."
spark_test.py
from pyspark.sql import SparkSession
import pytest
@pytest.fixture
def spark() -> SparkSession:
# Create a SparkSession (the entry point to Spark functionality) on
# the cluster in the remote Databricks workspace. Unit tests do not
# have access to this SparkSession by default.
return SparkSession.builder.getOrCreate()
# COMMAND ----------
def test_scenario_a(spark):
assert 1==1
databricks.yml
bundle:
name: any-name-you-want
targets:
# The 'dev' target, used for development purposes.
# Whenever a developer deploys using 'dev', they get their own copy.
dev:
# We use 'mode: development' to make sure everything deployed to this target gets a prefix
# like '[dev my_user_name]'. Setting this mode also disables any schedules and
# automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
mode: development
default: true
workspace:
host: https://adb-XXXXXXXXXXXXXXXXX.azuredatabricks.net
Cluster Specs:
DBR: 14.3 LTS ML
Libraries: PyPI PyTest
Steps to reproduce working pytest in databricks repos:
- Create the repo into the workspace at `/Workspace/Repos/USER_EMAIL/REPO_NAME/` location.
- Open the "execute_pytest.py" file which should now exist at "/Workspace/Repos/USER_EMAIL/REPO_NAME/execute_pytest.py"
- Attach the cluster and run all.
Steps to reproduce failing pytest in databricks DAB:
- Clone the repo to your local computer
- In the root of the repo open a terminal and run `databricks bundle deploy` (assuming you have databricks-cli already installed and configured for the workspace)
- In the workspace navigate to the notebook "execute_pytest.py" which should now exist at "/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/execute_pytest.py"
- Attach the cluster and run all.
Things that have been tried:
- I have tested that the same outcome happens regardless of using a python file or a notebook. That is why the repo contains both "execute_pytest.py" (Python file) and "execute_pytest_nb.py" (Notebook).
- Adding the CWD to the sys.path As referenced here . I have also tried this with the pytest.ini file As Referenced Here
- I have tried different file names
- Labels:
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ā07-05-2024 12:53 AM - edited ā07-05-2024 01:17 AM
Hello @Retired_mod,
Thankyou for your response. I am aware of what the error message means and that is exactly why I am requesting support. The same code deployed to two different locations in a workspace working differently is what I am trying to understand. Have you tried to replicate the issue ? I have supplied all of the necessary code to prove this.
I assume it will result in a pathing issue as I can rule out the directory structure being incorrect due to the code working when deployed to a Databricks Repo but not working when being deployed as a Databricks Asset Bundle.
I look forward to your response.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ā10-21-2024 09:21 AM
Hey, Chris. Did you ever get this working? Same issue here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ā10-21-2024 04:20 PM
I think you need to wrap your code into a python wheel file: Develop a Python wheel file using Databricks Asset Bundles | Databricks on AWS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a month ago
@ChrisLawford
You can run pytest through job
databricks bundle run -t dev pytest_job
I was able to work around in this way.
resource/pytest.job.yml
resources:
jobs:
pytest_job:
name: pytest_job
tasks:
- task_key: pytest_task
notebook_task:
notebook_path: src/pytest
src/pytest.ipynb
# pytest.main runs our tests directly in the notebook environment, providing
# fidelity for Spark and other configuration variables.
#
# A limitation of this approach is that changes to the test will be
# cache by Python's import caching mechanism.
#
# To iterate on tests during development, we restart the Python process
# and thus clear the import cache to pick up changes.
dbutils.library.restartPython()
import pytest
import os
import sys
# Run all tests in the repository root.
notebook_path = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
repo_root = os.path.dirname(os.path.dirname(notebook_path))
os.chdir(f'/Workspace/{repo_root}')
%pwd
# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
retcode = pytest.main(["./tests/test_sample.py", "-p", "no:cacheprovider"])
# Fail the cell execution if we have any test failures.
assert retcode == 0, 'The pytest invocation failed. See the log above for details.'
tests/test_sample.py
def test_aa():
assert True

