Resolved! Databricks Delta Live Table stored as SCD 2 is creating new records when no data changes. How do I stop this?
I have a streaming pipeline that ingests json files from a data lake using autoloader. These files are dumped there periodically. Mostly the files contain duplicate data, but there are occasional changes. I am trying to process these files into a dat...
- 6016 Views
- 6 replies
- 0 kudos
Latest Reply
For clarity, here is the final code that avoids duplicates, using @Suteja Kanuri​ 's suggestion:import dlt @dlt.table def currStudents_dedup(): df = spark.readStream.format("delta").table("live.currStudents_ingest") return ( df.drop...
- 0 kudos