Hi databricks, we met an issue like below picture shows:
we use pyspark api to store data into ADLS :
df.write.partitionBy("xx").option("partitionOverwriteMode","dynamic").mode("overwrite").parquet(xx)
However, not sure why the second time we overwrite this partition on 2024-09-26 4:29 PM, the previous data still exists...
The last committed log "_committed_3404689632661433446" shows like below: what has been removed is not tid 3175486376768535369 which was run on 2024-09-26 4:23 PM, what was removed was the data file in tid 2405413862834130470 which was run on 2024-09-20...
Does anyone know the root cause? and how to removed those data which should already be deleted? Thanks!