Want to know community members feedback on the below code which can work for specific table that is specified, this can be parameterized and run.
But is this the best way to manage (delete unwanted files of Delta tables that are externally stored in ADLS). Please let me know.
def file_exists_delete(path):
try:
dbutils.fs.ls(path)
dbutils.fs.rm(path)
print('removed the file '+path)
return True
except Exception as e:
if 'java.io.FileNotFoundException' in str(e):
return False
else:
raise
#Copy in Seperate Cell
spark.sql("OPTIMIZE tbl_name")
df=spark.sql("VACUUM tbl_name RETAIN 0 HOURS DRY RUN")
#Copy In seperate Cell
df_collect=df.collect()
#Copy in Seperate Cell and execute
for row in df_collect:
file_exists_delete(row[0])