cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Kearon
by New Contributor III
  • 6016 Views
  • 6 replies
  • 0 kudos

Resolved! Databricks Delta Live Table stored as SCD 2 is creating new records when no data changes. How do I stop this?

I have a streaming pipeline that ingests json files from a data lake using autoloader. These files are dumped there periodically. Mostly the files contain duplicate data, but there are occasional changes. I am trying to process these files into a dat...

  • 6016 Views
  • 6 replies
  • 0 kudos
Latest Reply
Kearon
New Contributor III
  • 0 kudos

For clarity, here is the final code that avoids duplicates, using @Suteja Kanuri​ 's suggestion:import dlt   @dlt.table def currStudents_dedup(): df = spark.readStream.format("delta").table("live.currStudents_ingest") return ( df.drop...

  • 0 kudos
5 More Replies
weldermartins
by Honored Contributor
  • 3149 Views
  • 3 replies
  • 13 kudos

Resolved! SCD type 2

Hey guys. I don't know if I'm tired, I ask for your help, but I don't understand where is the difference in the number of fields.Thanks! I'm replicating SCD type 2 based on this documentation:https://docs.delta.io/latest/delta-update.html#slowly-chan...

SCD 2
  • 3149 Views
  • 3 replies
  • 13 kudos
Latest Reply
weldermartins
Honored Contributor
  • 13 kudos

@Werner Stinckens​ ?

  • 13 kudos
2 More Replies
Labels