cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kearon
by New Contributor III
  • 3070 Views
  • 6 replies
  • 0 kudos

Resolved! Databricks Delta Live Table stored as SCD 2 is creating new records when no data changes. How do I stop this?

I have a streaming pipeline that ingests json files from a data lake using autoloader. These files are dumped there periodically. Mostly the files contain duplicate data, but there are occasional changes. I am trying to process these files into a dat...

  • 3070 Views
  • 6 replies
  • 0 kudos
Latest Reply
Kearon
New Contributor III
  • 0 kudos

For clarity, here is the final code that avoids duplicates, using @Suteja Kanuri​ 's suggestion:import dlt   @dlt.table def currStudents_dedup(): df = spark.readStream.format("delta").table("live.currStudents_ingest") return ( df.drop...

  • 0 kudos
5 More Replies
weldermartins
by Honored Contributor
  • 1469 Views
  • 3 replies
  • 13 kudos

Resolved! SCD type 2

Hey guys. I don't know if I'm tired, I ask for your help, but I don't understand where is the difference in the number of fields.Thanks! I'm replicating SCD type 2 based on this documentation:https://docs.delta.io/latest/delta-update.html#slowly-chan...

SCD 2
  • 1469 Views
  • 3 replies
  • 13 kudos
Latest Reply
weldermartins
Honored Contributor
  • 13 kudos

@Werner Stinckens​ ?

  • 13 kudos
2 More Replies
Labels