<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unit Testing DLT Pipelines in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unit-testing-dlt-pipelines/m-p/75560#M34985</link>
    <description>&lt;P&gt;Now we are moving our DLT Pipelines into production, we would like to start looking at unit testing the transformation logic inside DLT notebooks.&lt;/P&gt;&lt;P&gt;We want to know how we can unit test the PySpark logic/transformations independently without having to spin up a DLT pipeline. Mainly because you can run a DLT notebook and it will output saying it's fine and to create a pipeline, but when you run the pipeline it will then throw the actual errors associated with things like incorrect schema locations etc. It's also hard to debug transformations within DLT as you can't readily inspect inputs/outputs or add debug logic.&lt;/P&gt;&lt;P&gt;Does anyone have any guidance on suitable approaches towards unit testing DLT pipeline notebooks? Thanks&lt;/P&gt;</description>
    <pubDate>Mon, 24 Jun 2024 09:38:11 GMT</pubDate>
    <dc:creator>dm7</dc:creator>
    <dc:date>2024-06-24T09:38:11Z</dc:date>
    <item>
      <title>Unit Testing DLT Pipelines</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-testing-dlt-pipelines/m-p/75560#M34985</link>
      <description>&lt;P&gt;Now we are moving our DLT Pipelines into production, we would like to start looking at unit testing the transformation logic inside DLT notebooks.&lt;/P&gt;&lt;P&gt;We want to know how we can unit test the PySpark logic/transformations independently without having to spin up a DLT pipeline. Mainly because you can run a DLT notebook and it will output saying it's fine and to create a pipeline, but when you run the pipeline it will then throw the actual errors associated with things like incorrect schema locations etc. It's also hard to debug transformations within DLT as you can't readily inspect inputs/outputs or add debug logic.&lt;/P&gt;&lt;P&gt;Does anyone have any guidance on suitable approaches towards unit testing DLT pipeline notebooks? Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 24 Jun 2024 09:38:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-testing-dlt-pipelines/m-p/75560#M34985</guid>
      <dc:creator>dm7</dc:creator>
      <dc:date>2024-06-24T09:38:11Z</dc:date>
    </item>
    <item>
      <title>Re: Unit Testing DLT Pipelines</title>
      <link>https://community.databricks.com/t5/data-engineering/unit-testing-dlt-pipelines/m-p/75965#M35120</link>
      <description>&lt;P&gt;Hi Kaniz - what if we have some CDC change data capture stages in a DLT pipeline?&lt;BR /&gt;E.g. we have a CDC stage which uses SCD type 1 to take the latest record based on datetime. - How would we go about unit testing this code functions correctly? As it is a native DLT function so couldn't lift and shift this to a separate Python notebook&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jun 2024 17:01:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unit-testing-dlt-pipelines/m-p/75965#M35120</guid>
      <dc:creator>dm7</dc:creator>
      <dc:date>2024-06-27T17:01:51Z</dc:date>
    </item>
  </channel>
</rss>

