cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Listing all files under an Azure Data Lake Gen2 container

AmitSukralia
New Contributor

I am trying to find a way to list all files in an Azure Data Lake Gen2 container. I have mounted the storage account and can see the list of files in a folder (a container can have multiple level of folder hierarchies) if I know the exact path of the file. But I want something to list all files under all folders and subfolders in a given container. dbutils.fs.ls doesn't have any recursive list function nor does it support any wildcards in the file path. How can I achieve this?

5 REPLIES 5

dreich
New Contributor II

Use REST API?

Example here in Powershell: http://dreich.net/using-powershell-to-list-azure-datalake-gen2-contents

Only authentication available to do this is with Access Keys.

ankitha
New Contributor II

I wrote a custom function to get all the required files. The function considers the ADL container- root of a tree, performs "ls" on the root, performs a "ls" on its children recursively and returns with Leaf nodes (which are the required files).

The base condition for the recursive function would be to check if the current node's path ends with a "/". All leaf nodes in the document structure do not have a "/" in their path.

JithuBalan
New Contributor II

you can create recursive function in python inside the databricks.

something like this.

def filedetails(path):

lists = dbutils.fs.ls(path)

global num

for i in lists:

if (i[1][-1] == "/"):

num += 1

lenfiles = dbutils.fs.ls(i[0])

modifiedlist.append((i[0],i[1],i[2],len(lenfiles)))

filedetails(i[0])

StefanSchenk
New Contributor II

Here's one that might help:

def deep_ls(path: str):
    """List all files in base path recursively."""
    for x in dbutils.fs.ls(path):
        if x.path[-1] is not '/':
            yield x
        else:
            for y in deep_ls(x.path):
                yield y

Usage:

https://gist.github.com/Menziess/bfcbea6a309e0990e8c296ce23125059

Balaji_su
New Contributor II
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.