How to find which delta commit removed a specific file?

brickster_2018
Databricks Employee
Databricks Employee
 

brickster_2018
Databricks Employee
Databricks Employee
val oldestVersionAvailable = 
val newestVersionAvailable = 
val pathToDeltaTable = ""
val pathToFileName = ""
(oldestVersionAvailable to newestVersionAvailable).map { version => 
    var df1 = spark.read.json(f"$pathToDeltaTable/_delta_log/$version%020d.json")
    if (df1.columns.toSeq.contains("remove")) {
      var df2 = df1.where("remove is not null").select("remove.path")
      var df3 = df2.filter('path.contains(pathToFileName))
      if (df3.count > 0)
         print(s"Commit Version $version removed the file $pathToFileName \n")
  }
}

View solution in original post