cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks overwrite didn't delete previous data

Hi databricks, we met an issue like below picture shows:

_0-1729067185207.png

we use pyspark api to store data into ADLS :

df.write.partitionBy("xx").option("partitionOverwriteMode","dynamic").mode("overwrite").parquet(xx)
However, not sure why the second time we overwrite this partition on 2024-09-26 4:29 PM, the previous data still exists...
 
The last committed log "_committed_3404689632661433446"  shows like below: what has been removed is not tid 3175486376768535369 which was run on 2024-09-26 4:23 PM, what was removed was the data file in tid 
2405413862834130470 which was run on 2024-09-20...
_1-1729067620519.png
 
Does anyone know the root cause? and how to removed those data which should already be deleted? Thanks!

 

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group