cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to find which delta commit removed a specific file?

User16869510359
Esteemed Contributor
1 ACCEPTED SOLUTION

Accepted Solutions

User16869510359
Esteemed Contributor
val oldestVersionAvailable = 
val newestVersionAvailable = 
val pathToDeltaTable = ""
val pathToFileName = ""
(oldestVersionAvailable to newestVersionAvailable).map { version => 
    var df1 = spark.read.json(f"$pathToDeltaTable/_delta_log/$version%020d.json")
    if (df1.columns.toSeq.contains("remove")) {
      var df2 = df1.where("remove is not null").select("remove.path")
      var df3 = df2.filter('path.contains(pathToFileName))
      if (df3.count > 0)
         print(s"Commit Version $version removed the file $pathToFileName \n")
  }
}

View solution in original post

1 REPLY 1

User16869510359
Esteemed Contributor
val oldestVersionAvailable = 
val newestVersionAvailable = 
val pathToDeltaTable = ""
val pathToFileName = ""
(oldestVersionAvailable to newestVersionAvailable).map { version => 
    var df1 = spark.read.json(f"$pathToDeltaTable/_delta_log/$version%020d.json")
    if (df1.columns.toSeq.contains("remove")) {
      var df2 = df1.where("remove is not null").select("remove.path")
      var df3 = df2.filter('path.contains(pathToFileName))
      if (df3.count > 0)
         print(s"Commit Version $version removed the file $pathToFileName \n")
  }
}

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.