<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Duplicates detected in transformed data - Help with troubleshooting in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/duplicates-detected-in-transformed-data-help-with/m-p/125267#M47395</link>
    <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;Can anyone help with an error I am getting when running ADF. An ingestion pipeline fails and when I click on the link I am taken to a Databricks error message "7 duplicates detected in transformed data". However, when I run the transformation cell of the notebook in question I get no issues with the data produced and there are zero duplicate rows. Another notebook referencing this notebook (which is also run as part of the ADF pipeline) has a check for duplicates and that is what is causing the ADF ingestion pipeline to fail. Since I have been unable to replicate the error and identify any duplicate rows based on the SQL which is being run in the Databricks notebook, is anyone able to advise me on anything I can do within Databricks to get it to tell me what the 7 rows of data in question are? Sorry if this request is a bit muddled, I am new to Databricks.&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
    <pubDate>Tue, 15 Jul 2025 09:45:19 GMT</pubDate>
    <dc:creator>Firehose74</dc:creator>
    <dc:date>2025-07-15T09:45:19Z</dc:date>
    <item>
      <title>Duplicates detected in transformed data - Help with troubleshooting</title>
      <link>https://community.databricks.com/t5/data-engineering/duplicates-detected-in-transformed-data-help-with/m-p/125267#M47395</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;Can anyone help with an error I am getting when running ADF. An ingestion pipeline fails and when I click on the link I am taken to a Databricks error message "7 duplicates detected in transformed data". However, when I run the transformation cell of the notebook in question I get no issues with the data produced and there are zero duplicate rows. Another notebook referencing this notebook (which is also run as part of the ADF pipeline) has a check for duplicates and that is what is causing the ADF ingestion pipeline to fail. Since I have been unable to replicate the error and identify any duplicate rows based on the SQL which is being run in the Databricks notebook, is anyone able to advise me on anything I can do within Databricks to get it to tell me what the 7 rows of data in question are? Sorry if this request is a bit muddled, I am new to Databricks.&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jul 2025 09:45:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/duplicates-detected-in-transformed-data-help-with/m-p/125267#M47395</guid>
      <dc:creator>Firehose74</dc:creator>
      <dc:date>2025-07-15T09:45:19Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates detected in transformed data - Help with troubleshooting</title>
      <link>https://community.databricks.com/t5/data-engineering/duplicates-detected-in-transformed-data-help-with/m-p/129796#M48606</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/139430"&gt;@Firehose74&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;This may need a deeper investigation and require workspace access to troubleshoot/review logs. Can you please raise a ticket with us?&lt;/P&gt;</description>
      <pubDate>Tue, 26 Aug 2025 10:29:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/duplicates-detected-in-transformed-data-help-with/m-p/129796#M48606</guid>
      <dc:creator>Sidhant07</dc:creator>
      <dc:date>2025-08-26T10:29:28Z</dc:date>
    </item>
  </channel>
</rss>

