cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Listing all files under an Azure Data Lake Gen2 container

AmitSukralia
New Contributor

I am trying to find a way to list all files in an Azure Data Lake Gen2 container. I have mounted the storage account and can see the list of files in a folder (a container can have multiple level of folder hierarchies) if I know the exact path of the file. But I want something to list all files under all folders and subfolders in a given container. dbutils.fs.ls doesn't have any recursive list function nor does it support any wildcards in the file path. How can I achieve this?

5 REPLIES 5

dreich
New Contributor II

Use REST API?

Example here in Powershell: http://dreich.net/using-powershell-to-list-azure-datalake-gen2-contents

Only authentication available to do this is with Access Keys.

ankitha
New Contributor II

I wrote a custom function to get all the required files. The function considers the ADL container- root of a tree, performs "ls" on the root, performs a "ls" on its children recursively and returns with Leaf nodes (which are the required files).

The base condition for the recursive function would be to check if the current node's path ends with a "/". All leaf nodes in the document structure do not have a "/" in their path.

JithuBalan
New Contributor II

you can create recursive function in python inside the databricks.

something like this.

def filedetails(path):

lists = dbutils.fs.ls(path)

global num

for i in lists:

if (i[1][-1] == "/"):

num += 1

lenfiles = dbutils.fs.ls(i[0])

modifiedlist.append((i[0],i[1],i[2],len(lenfiles)))

filedetails(i[0])

StefanSchenk
New Contributor II

Here's one that might help:

def deep_ls(path: str):
    """List all files in base path recursively."""
    for x in dbutils.fs.ls(path):
        if x.path[-1] is not '/':
            yield x
        else:
            for y in deep_ls(x.path):
                yield y

Usage:

https://gist.github.com/Menziess/bfcbea6a309e0990e8c296ce23125059

Balaji_su
New Contributor II

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group