โ06-02-2019 04:22 AM
I am trying to find a way to list all files in an Azure Data Lake Gen2 container. I have mounted the storage account and can see the list of files in a folder (a container can have multiple level of folder hierarchies) if I know the exact path of the file. But I want something to list all files under all folders and subfolders in a given container. dbutils.fs.ls doesn't have any recursive list function nor does it support any wildcards in the file path. How can I achieve this?
โ09-18-2019 03:04 AM
Use REST API?
Example here in Powershell: http://dreich.net/using-powershell-to-list-azure-datalake-gen2-contentsOnly authentication available to do this is with Access Keys.โ12-15-2019 09:20 PM
I wrote a custom function to get all the required files. The function considers the ADL container- root of a tree, performs "ls" on the root, performs a "ls" on its children recursively and returns with Leaf nodes (which are the required files).
The base condition for the recursive function would be to check if the current node's path ends with a "/". All leaf nodes in the document structure do not have a "/" in their path.
โ02-27-2020 08:00 AM
you can create recursive function in python inside the databricks.
something like this.
def filedetails(path):
lists = dbutils.fs.ls(path)
global num
for i in lists:
if (i[1][-1] == "/"):
num += 1
lenfiles = dbutils.fs.ls(i[0])
modifiedlist.append((i[0],i[1],i[2],len(lenfiles)))
filedetails(i[0])
โ02-27-2020 09:43 AM
Here's one that might help:
def deep_ls(path: str):
"""List all files in base path recursively."""
for x in dbutils.fs.ls(path):
if x.path[-1] is not '/':
yield x
else:
for y in deep_ls(x.path):
yield y
Usage:
https://gist.github.com/Menziess/bfcbea6a309e0990e8c296ce23125059โ03-22-2020 10:04 AM
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group