cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

PyTest working in Repos but not in Databricks Asset Bundles

ChrisLawford
New Contributor

Hello,
I am trying to run PyTest from a notebook or python file that exists due to being deployed by a Databricks Asset Bundle (DAB).
I have a repository that contains a number of files with the end goal of trying to run PyTest in a directory to validate my code.
I shall explain the structure of the repo and the steps to reproduce the issue but in essence I am seeing different behavior from the same code when running in the `/Workspace/Repos/USER_EMAIL/REPO_NAME/NOTEBOOK_FILE` and running in `/Workspace/Users/USER_EMAIL/.bundle/BUNDLE_NAME/dev/files/NOTEBOOK_FILE`
When running in the repos folder I am able to run the NOTEBOOK_FILE that runs pytest and see the tests passing result. When running in the DAB folder I am able to run the NOTEBOOK_FILE that runs pytest and get the error 

 

________________________ ERROR collecting spark_test.py ________________________
ImportError while importing test module '/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/spark_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'spark_test'

 

Files and folder Structure

The files and folder structure in the repo:

 

REPO_NAME/
ā”œā”€ā”€ execute_pytest.py
ā”œā”€ā”€ execute_pytest_nb.py
ā”œā”€ā”€ databricks.yml
ā””ā”€ā”€ spark_test.py

 

execute_pytest.py

 

import pytest
import os
import sys
    
# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
    
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

 

 execute_pytest_nb.py

 

# Databricks notebook source
import pytest
import os
import sys

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
    
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

 

 spark_test.py

 

from pyspark.sql import SparkSession
import pytest
@pytest.fixture
def spark() -> SparkSession:
  # Create a SparkSession (the entry point to Spark functionality) on
  # the cluster in the remote Databricks workspace. Unit tests do not
  # have access to this SparkSession by default.
  return SparkSession.builder.getOrCreate()


# COMMAND ----------

def test_scenario_a(spark):
    assert 1==1

 

databricks.yml

 

bundle:
  name: any-name-you-want

targets:
  # The 'dev' target, used for development purposes.
  # Whenever a developer deploys using 'dev', they get their own copy.
  dev:
    # We use 'mode: development' to make sure everything deployed to this target gets a prefix
    # like '[dev my_user_name]'. Setting this mode also disables any schedules and
    # automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
    mode: development
    default: true
    workspace:
      host: https://adb-XXXXXXXXXXXXXXXXX.azuredatabricks.net

 

Cluster Specs:

DBR: 14.3 LTS ML

Libraries: PyPI PyTest

Steps to reproduce working pytest in databricks repos:

  1. Create the repo into the workspace at `/Workspace/Repos/USER_EMAIL/REPO_NAME/` location.
  2. Open the "execute_pytest.py" file which should now exist at "/Workspace/Repos/USER_EMAIL/REPO_NAME/execute_pytest.py"
  3. Attach the cluster and run all.

Steps to reproduce failing pytest in databricks DAB:

  1. Clone the repo to your local computer
  2. In the root of the repo open a terminal and run `databricks bundle deploy` (assuming you have databricks-cli already installed and configured for the workspace)
  3. In the workspace navigate to the notebook "execute_pytest.py" which should now exist at "/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/execute_pytest.py"
  4. Attach the cluster and run all.

Things that have been tried:

  • I have tested that the same outcome happens regardless of using a python file or a notebook. That is why the repo contains both "execute_pytest.py" (Python file) and "execute_pytest_nb.py" (Notebook).
  • Adding the CWD to the sys.path As referenced here . I have also tried this with the pytest.ini file As Referenced Here 
  • I have tried different file names
2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @ChrisLawfordThe error message youā€™re encountering indicates that the spark_test module is not being found.

  • Ensure that your directory structure is set up correctly.
  • Sometimes, pytest may not discover your modules due to path issues.

ChrisLawford
New Contributor

Hello @Kaniz_Fatma,

Thankyou for your response. I am aware of what the error message means and that is exactly why I am requesting support. The same code deployed to two different locations in a workspace working differently is what I am trying to understand. Have you tried to replicate the issue ? I have supplied all of the necessary code to prove this.

I assume it will result in a pathing issue as I can rule out the directory structure being incorrect due to the code working when deployed to a Databricks Repo but not working when being deployed as a Databricks Asset Bundle.
I look forward to your response. 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!