cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

PyTest working in Repos but not in Databricks Asset Bundles

ChrisLawford
New Contributor II

Hello,
I am trying to run PyTest from a notebook or python file that exists due to being deployed by a Databricks Asset Bundle (DAB).
I have a repository that contains a number of files with the end goal of trying to run PyTest in a directory to validate my code.
I shall explain the structure of the repo and the steps to reproduce the issue but in essence I am seeing different behavior from the same code when running in the `/Workspace/Repos/USER_EMAIL/REPO_NAME/NOTEBOOK_FILE` and running in `/Workspace/Users/USER_EMAIL/.bundle/BUNDLE_NAME/dev/files/NOTEBOOK_FILE`
When running in the repos folder I am able to run the NOTEBOOK_FILE that runs pytest and see the tests passing result. When running in the DAB folder I am able to run the NOTEBOOK_FILE that runs pytest and get the error 

 

________________________ ERROR collecting spark_test.py ________________________
ImportError while importing test module '/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/spark_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'spark_test'

 

Files and folder Structure

The files and folder structure in the repo:

 

REPO_NAME/
ā”œā”€ā”€ execute_pytest.py
ā”œā”€ā”€ execute_pytest_nb.py
ā”œā”€ā”€ databricks.yml
ā””ā”€ā”€ spark_test.py

 

execute_pytest.py

 

import pytest
import os
import sys
    
# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
    
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

 

 execute_pytest_nb.py

 

# Databricks notebook source
import pytest
import os
import sys

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True
    
# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])
# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

 

 spark_test.py

 

from pyspark.sql import SparkSession
import pytest
@pytest.fixture
def spark() -> SparkSession:
  # Create a SparkSession (the entry point to Spark functionality) on
  # the cluster in the remote Databricks workspace. Unit tests do not
  # have access to this SparkSession by default.
  return SparkSession.builder.getOrCreate()


# COMMAND ----------

def test_scenario_a(spark):
    assert 1==1

 

databricks.yml

 

bundle:
  name: any-name-you-want

targets:
  # The 'dev' target, used for development purposes.
  # Whenever a developer deploys using 'dev', they get their own copy.
  dev:
    # We use 'mode: development' to make sure everything deployed to this target gets a prefix
    # like '[dev my_user_name]'. Setting this mode also disables any schedules and
    # automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines.
    mode: development
    default: true
    workspace:
      host: https://adb-XXXXXXXXXXXXXXXXX.azuredatabricks.net

 

Cluster Specs:

DBR: 14.3 LTS ML

Libraries: PyPI PyTest

Steps to reproduce working pytest in databricks repos:

  1. Create the repo into the workspace at `/Workspace/Repos/USER_EMAIL/REPO_NAME/` location.
  2. Open the "execute_pytest.py" file which should now exist at "/Workspace/Repos/USER_EMAIL/REPO_NAME/execute_pytest.py"
  3. Attach the cluster and run all.

Steps to reproduce failing pytest in databricks DAB:

  1. Clone the repo to your local computer
  2. In the root of the repo open a terminal and run `databricks bundle deploy` (assuming you have databricks-cli already installed and configured for the workspace)
  3. In the workspace navigate to the notebook "execute_pytest.py" which should now exist at "/Workspace/Users/USER_EMAIL/.bundle/any-name-you-want/dev/files/execute_pytest.py"
  4. Attach the cluster and run all.

Things that have been tried:

  • I have tested that the same outcome happens regardless of using a python file or a notebook. That is why the repo contains both "execute_pytest.py" (Python file) and "execute_pytest_nb.py" (Notebook).
  • Adding the CWD to the sys.path As referenced here . I have also tried this with the pytest.ini file As Referenced Here 
  • I have tried different file names
3 REPLIES 3

ChrisLawford
New Contributor II

Hello @Retired_mod,

Thankyou for your response. I am aware of what the error message means and that is exactly why I am requesting support. The same code deployed to two different locations in a workspace working differently is what I am trying to understand. Have you tried to replicate the issue ? I have supplied all of the necessary code to prove this.

I assume it will result in a pathing issue as I can rule out the directory structure being incorrect due to the code working when deployed to a Databricks Repo but not working when being deployed as a Databricks Asset Bundle.
I look forward to your response. 

538014
New Contributor II

Hey, Chris. Did you ever get this working? Same issue here.

uzi49
New Contributor II

I think you need to wrap your code into a python wheel file: Develop a Python wheel file using Databricks Asset Bundles | Databricks on AWS

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonā€™t want to miss the chance to attend and share knowledge.

If there isnā€™t a group near you, start one and help create a community that brings people together.

Request a New Group