cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Duplicates detected in transformed data - Help with troubleshooting

Firehose74
New Contributor III

Hello

Can anyone help with an error I am getting when running ADF. An ingestion pipeline fails and when I click on the link I am taken to a Databricks error message "7 duplicates detected in transformed data". However, when I run the transformation cell of the notebook in question I get no issues with the data produced and there are zero duplicate rows. Another notebook referencing this notebook (which is also run as part of the ADF pipeline) has a check for duplicates and that is what is causing the ADF ingestion pipeline to fail. Since I have been unable to replicate the error and identify any duplicate rows based on the SQL which is being run in the Databricks notebook, is anyone able to advise me on anything I can do within Databricks to get it to tell me what the 7 rows of data in question are? Sorry if this request is a bit muddled, I am new to Databricks.

Thank you

1 REPLY 1

Sidhant07
Databricks Employee
Databricks Employee

Hi @Firehose74 ,

This may need a deeper investigation and require workspace access to troubleshoot/review logs. Can you please raise a ticket with us?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now