cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

PyTest working in Repos but not in Databricks Asset Bundles

ChrisLawford
New Contributor II

Hello,
I am trying to run PyTest from a notebook or python file that exists due to being deployed by a Databricks Asset Bundle (DAB).
I have a repository that contains a number of files with the end goal of trying to run PyTest in a directory to validate my code.
I shall explain the structure of the repo and the steps to reproduce the issue but in essence I am seeing different behavior from the same code when running in the `/Workspace/Repos/USER_EMAIL/REPO_NAME/NOTEBOOK_FILE` and running in `/Workspace/Users/USER_EMAIL/.bundle/BUNDLE_NAME/dev/files/NOTEBOOK_FILE`
When running in the repos folder I am able to run the NOTEBOOK_FILE that runs pytest and see the tests passing result. When running in the DAB folder I am able to run the NOTEBOOK_FILE that runs pytest and get the error 

 

________________________ ERROR collecting spark_test.py ________________________
ImportError while importing test module '/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/spark_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'spark_test'

 

Files and folder Structure

The files and folder structure in the repo:

 

REPO_NAME/
ā”œā”€ā”€ execute_pytest.py
ā”œā”€ā”€ execute_pytest_nb.py
ā”œā”€ā”€ databricks.yml
└── spark_test.py

 

execute_pytest.py

 

import pytest
import os
import sys
    
# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
    
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

 

 execute_pytest_nb.py

 

# Databricks notebook source
import pytest
import os
import sys

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
    
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

 

 spark_test.py

 

from pyspark.sql import SparkSession
import pytest
@pytest.fixture
def spark() -> SparkSession:
  # Create a SparkSession (the entry point to Spark functionality) on
  # the cluster in the remote Databricks workspace. Unit tests do not
  # have access to this SparkSession by default.
  return SparkSession.builder.getOrCreate()


# COMMAND ----------

def test_scenario_a(spark):
    assert 1==1

 

databricks.yml

 

bundle:
  name: any-name-you-want

targets:
  # The 'dev' target, used for development purposes.
  # Whenever a developer deploys using 'dev', they get their own copy.
  dev:
    # We use 'mode: development' to make sure everything deployed to this target gets a prefix
    # like '[dev my_user_name]'. Setting this mode also disables any schedules and
    # automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
    mode: development
    default: true
    workspace:
      host: https://adb-XXXXXXXXXXXXXXXXX.azuredatabricks.net

 

Cluster Specs:

DBR: 14.3 LTS ML

Libraries: PyPI PyTest

Steps to reproduce working pytest in databricks repos:

  1. Create the repo into the workspace at `/Workspace/Repos/USER_EMAIL/REPO_NAME/` location.
  2. Open the "execute_pytest.py" file which should now exist at "/Workspace/Repos/USER_EMAIL/REPO_NAME/execute_pytest.py"
  3. Attach the cluster and run all.

Steps to reproduce failing pytest in databricks DAB:

  1. Clone the repo to your local computer
  2. In the root of the repo open a terminal and run `databricks bundle deploy` (assuming you have databricks-cli already installed and configured for the workspace)
  3. In the workspace navigate to the notebook "execute_pytest.py" which should now exist at "/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/execute_pytest.py"
  4. Attach the cluster and run all.

Things that have been tried:

  • I have tested that the same outcome happens regardless of using a python file or a notebook. That is why the repo contains both "execute_pytest.py" (Python file) and "execute_pytest_nb.py" (Notebook).
  • Adding the CWD to the sys.path As referenced here . I have also tried this with the pytest.ini file As Referenced Here 
  • I have tried different file names
4 REPLIES 4

ChrisLawford
New Contributor II

Hello @Retired_mod,

Thankyou for your response. I am aware of what the error message means and that is exactly why I am requesting support. The same code deployed to two different locations in a workspace working differently is what I am trying to understand. Have you tried to replicate the issue ? I have supplied all of the necessary code to prove this.

I assume it will result in a pathing issue as I can rule out the directory structure being incorrect due to the code working when deployed to a Databricks Repo but not working when being deployed as a Databricks Asset Bundle.
I look forward to your response. 

538014
New Contributor II

Hey, Chris. Did you ever get this working? Same issue here.

uzi49
New Contributor II

I think you need to wrap your code into a python wheel file: Develop a Python wheel file using Databricks Asset Bundles | Databricks on AWS

cinyoung
New Contributor II

@ChrisLawford 
You can run pytest through job

 

databricks bundle run -t dev pytest_job

 

I was able to work around in this way.

resource/pytest.job.yml

resources:
  jobs:
    pytest_job:
      name: pytest_job

      tasks:
        - task_key: pytest_task
          notebook_task:
            notebook_path: src/pytest

src/pytest.ipynb

# pytest.main runs our tests directly in the notebook environment, providing
# fidelity for Spark and other configuration variables.
#
# A limitation of this approach is that changes to the test will be
# cache by Python's import caching mechanism.
#
# To iterate on tests during development, we restart the Python process 
# and thus clear the import cache to pick up changes.
dbutils.library.restartPython()

import pytest
import os
import sys

# Run all tests in the repository root.
notebook_path = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
repo_root = os.path.dirname(os.path.dirname(notebook_path))
os.chdir(f'/Workspace/{repo_root}')
%pwd

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True

retcode = pytest.main(["./tests/test_sample.py", "-p", "no:cacheprovider"])

# Fail the cell execution if we have any test failures.
assert retcode == 0, 'The pytest invocation failed. See the log above for details.'

tests/test_sample.py

def test_aa():
    assert True

 

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now