<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unit tests in notebook not working in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unit-tests-in-notebook-not-working/m-p/61973#M31864</link>
    <description>&lt;P&gt;thank you for the nutter. Tried it and it seems to answer my problematic.&lt;/P&gt;</description>
    <pubDate>Mon, 26 Feb 2024 08:39:15 GMT</pubDate>
    <dc:creator>RabahO</dc:creator>
    <dc:date>2024-02-26T08:39:15Z</dc:date>
    <item>
      <title>Unit tests in notebook not working</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-tests-in-notebook-not-working/m-p/61379#M31775</link>
      <description>&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm trying to setup a notebook for tests or data quality checks. The name is not important.&lt;/P&gt;&lt;P&gt;I basically read a table (the ETL output process - actual data).&lt;/P&gt;&lt;P&gt;Then I read another table and do the calculation in the notebook (expected data)&lt;/P&gt;&lt;P&gt;I'm stuck at the assertEqual(actual_df, expected_df) part. Basically the assert never works no matter the library I'm using.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried with Chispa (a pyspark library for testing, very convenient to avoid doing collects and it help showing the exact row where the differences are) but it didn't work, so I tried with unittest module, but same problem.&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's as if the part where the collect happens is skipped and the assert is never triggered. (the collect works if I do it in any other cell)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's some code to show you the logic:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;# cell 1
expected_data_query = "select ***"
expected_data_df = spark.sql(expected_data_query)

# cell 2
actual_data_query = "select ***"
actual_data_df = spark.sql(actual_data_query)

# cell 3
# starts the pyspark job then they all end up in "skipped state"
assert_df_equality(actual_accretio_timechange_df, actual_accretio_timechange_df)

# cell 4
# same as cell 3 # can't find the code but I inherited unittest module in a class, made # a unit test function and then ran it in the same way as the documentation says:
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])


&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Feb 2024 15:39:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-tests-in-notebook-not-working/m-p/61379#M31775</guid>
      <dc:creator>RabahO</dc:creator>
      <dc:date>2024-02-21T15:39:49Z</dc:date>
    </item>
    <item>
      <title>Re: Unit tests in notebook not working</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-tests-in-notebook-not-working/m-p/61405#M31787</link>
      <description>&lt;P&gt;you can use nutter,&amp;nbsp;&lt;SPAN&gt;Testing framework for Databricks notebooks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/microsoft/nutter" target="_blank"&gt;microsoft/nutter: Testing framework for Databricks notebooks (github.com)&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Feb 2024 00:56:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-tests-in-notebook-not-working/m-p/61405#M31787</guid>
      <dc:creator>feiyun0112</dc:creator>
      <dc:date>2024-02-22T00:56:16Z</dc:date>
    </item>
    <item>
      <title>Re: Unit tests in notebook not working</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-tests-in-notebook-not-working/m-p/61973#M31864</link>
      <description>&lt;P&gt;thank you for the nutter. Tried it and it seems to answer my problematic.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Feb 2024 08:39:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-tests-in-notebook-not-working/m-p/61973#M31864</guid>
      <dc:creator>RabahO</dc:creator>
      <dc:date>2024-02-26T08:39:15Z</dc:date>
    </item>
  </channel>
</rss>

