cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unit tests in notebook not working

RabahO
New Contributor III

Hello, 

I'm trying to setup a notebook for tests or data quality checks. The name is not important.

I basically read a table (the ETL output process - actual data).

Then I read another table and do the calculation in the notebook (expected data)

I'm stuck at the assertEqual(actual_df, expected_df) part. Basically the assert never works no matter the library I'm using. 

I tried with Chispa (a pyspark library for testing, very convenient to avoid doing collects and it help showing the exact row where the differences are) but it didn't work, so I tried with unittest module, but same problem. 

It's as if the part where the collect happens is skipped and the assert is never triggered. (the collect works if I do it in any other cell)

 

Here's some code to show you the logic:

 

# cell 1
expected_data_query = "select ***"
expected_data_df = spark.sql(expected_data_query)

# cell 2
actual_data_query = "select ***"
actual_data_df = spark.sql(actual_data_query)

# cell 3
# starts the pyspark job then they all end up in "skipped state"
assert_df_equality(actual_accretio_timechange_df, actual_accretio_timechange_df)

# cell 4
# same as cell 3 # can't find the code but I inherited unittest module in a class, made # a unit test function and then ran it in the same way as the documentation says:
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])


 

 

1 ACCEPTED SOLUTION

Accepted Solutions

feiyun0112
Honored Contributor
2 REPLIES 2

feiyun0112
Honored Contributor

you can use nutter, Testing framework for Databricks notebooks

microsoft/nutter: Testing framework for Databricks notebooks (github.com)

RabahO
New Contributor III

thank you for the nutter. Tried it and it seems to answer my problematic.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group