remove empty folders with pyspark
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-09-2024 08:13 AM
Hi,
I am trying to search a mnt point for any empty folders and remove them. Does anyone know of a way to do this? I have tried dbutils.fs.walk but this does not seem to work.
Thanks
2 REPLIES 2
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2024 05:41 AM
Unfortunately this says that every folder in my mnt point has a size of 0. I have folders that have folders in which then might contain a metadata file for a streaming checkpoint.
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-16-2025 06:26 AM
Hello @Nathant93,
You could use dbutils.fs.ls and iterate on all the directories found to accomplish this task.
Something like this:
def find_empty_dirs(path):
directories = dbutils.fs.ls(path)
for directory in directories:
if directory.isDir():
find_empty_dirs(directory.path)
contents = dbutils.fs.ls(directory.path)
if len(contents) == 0:
# Logic
dbutils.fs.rm(directory.path, recurse=True)
print(f"Removed empty directory: {directory.path}")
find_empty_dirs("dbfs:/mnt/your_mount_point")

